Zhoujun (Jorge) Cheng

Zhoujun (Jorge) Cheng

Photos and videos

Tweets

Pinned Tweet

Zhoujun (Jorge) Cheng

@ChengZhoujun

Jan 20

Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below

442

70,769

Zhoujun (Jorge) Cheng

Zhoujun (Jorge) Cheng

@ChengZhoujun

May 26

Very solid data work here on cua envs. Massive value in both the outcome and the execution!

Bowen Wang

@BowenWangNLP

May 26

RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algorithm, it is the data. 🐌 🚀 We introduce CUA-Gym: a scalable, lightweight synthesis engine that turns arbitrary task queries into verifiable RLVR data for computer-use agents. The largest open CUA RLVR dataset to date: 🎯 32,122 verifiable RLVR tasks with programmatic setup scripts rewards 🌐 110 environments: 16 desktop apps 94 synthesized mock web apps 🏆 Qwen3.5-based CUA models trained with GSPO reach 72.6% on OSWorld-Verified and 56.6% on WebArena 📄 Paper: huggingface.co/papers/2605.2… 🏠 Homepage: cua-gym.xlang.ai 🤗 Dataset: huggingface.co/datasets/xlan… 💻 Codebase: github.com/xlang-ai/CUA-Gym 🧩 Environments: github.com/xlang-ai/CUA-Gym-… 🧵[1/6]

0:53

2,479

Benhao Huang

Zhoujun (Jorge) Cheng retweeted

Benhao Huang

@huskydogewoof

May 22

🌀 Introducing 𝐄𝐪𝐮𝐢𝐥𝐢𝐛𝐫𝐢𝐮𝐦 𝐑𝐞𝐚𝐬𝐨𝐧𝐞𝐫𝐬 (𝐄𝐪𝐑) ! Feedforward models and weight-tied models behave very differently on hard reasoning generalization. EqR pushes this difference to the extreme by learning 𝐭𝐚𝐬𝐤-𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐞𝐝 𝐧𝐞𝐮𝐫𝐚𝐥 𝐚𝐭𝐭𝐫𝐚𝐜𝐭𝐨𝐫𝐬 . • Sudoku-Extreme: 99.8% • Maze: 93% #ICML2026

1:39

308

76,161

Zora Wang

Zhoujun (Jorge) Cheng retweeted

Zora Wang @ZhiruoW

May 19

Excited to announce our tutorial: "Future of Work in the Age of LLMs" at #ACL2026 in San Diego, July 2! 🌴 There's a lot of speculation about AI and the future of human work. Our tutorial unpacks it from four angles: → The landscape of human work → How to build LLMs to augment real-world workflows → How to evaluate these LLMs → The future of work with LLMs/LLM-based agents

136

17,108

Zhoujun (Jorge) Cheng

Zhoujun (Jorge) Cheng

@ChengZhoujun

May 19

Thanks RadixArk for sharing our work NanoRollout!!

RadixArk

@radixark

May 19

Slow, heavy environments have been the real bottleneck for agentic RL. NanoRollout tackles it head-on with a clean rollout-as-a-service design, integrated with miles for scalable agent RL. Great work from the team！

2,195

RadixArk

Zhoujun (Jorge) Cheng retweeted

RadixArk

@radixark

May 19

Junli Wang

@JunliWang2021

May 14

Digital agent learning needs massive rollouts. But digital agent rollouts are painfully slow due to heavy environments. 🐌 🚀 We introduce NanoRollout, a lightweight open infra (900 lines core code) for digital agent rollout at scale, validated with three workloads: 🏋️ Large batchsize (4K) SWE Agent RL -> surpasses DeepSWE-32B 🧪 250k distilled coding trajectories -> SOTA ≤32B open coding agent ⚡ Fast evaluation on coding/cua/unified agent -> finish Check our Blog: cocoa-org.notion.site/nanoro…

14,917

Junli Wang

Zhoujun (Jorge) Cheng retweeted

Junli Wang

@JunliWang2021

May 16

Thrilled to see those promising numbers! 🤯 Same finding on our end with NanoRollout: cross-scaffold generalization basically doesn't happen out of the box -- something the field should be talking about more.

Wenlin Yao @YaoWenlin

May 15

🌳 Introducing Orchard — an open-source agentic modeling framework! 🎉 One thin & cheap sandbox infra powers training recipes across SWE / GUI / personal-assistant agents: ⚙️ Orchard Env: 0.28s exec latency; 100% success @ 1,000 parallel sandboxes 💪 🛠️ Orchard-SWE: 67.5% on SWE-bench Verified (30B-A3B, ~3B active) 🖥️ Orchard-GUI: 68.4% avg on WebVoyager / Online-Mind2Web / DeepShop (4B!) 📬 Orchard-Claw: 73.9% pass@3 on Claw-Eval 🔗 arxiv.org/abs/2605.15040 📦 Code and data are coming soon! Let's accelerate open agentic AI! 🚀

6,052

Huaizheng Zhang

Zhoujun (Jorge) Cheng retweeted

Huaizheng Zhang

@zhzHNN

May 15

Cool. I always enjoy playing with nano projects. No matter who asks me how to learn LLM, my answer is always the same. - Start with nanochat/nanogpt. - Then pick one super niche direction. - Deep dive into it. Build a nano project. Scale it gradually. That's all.

Hao Zhang

@haozhangml

May 14

check our new work nanorollout!!

4,092

Dinghuai Zhang 张鼎怀

Zhoujun (Jorge) Cheng retweeted

Dinghuai Zhang 张鼎怀

@zdhnarsil

May 15

Been working on the coding model behind this for a while. Still needs huge improvement, but let's see!

xAI

@xai

May 14

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at x.ai/cli

217

10,411

Prithviraj (Raj) Ammanabrolu

Zhoujun (Jorge) Cheng retweeted

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

May 14

The lack of light weight, open agent infra has been a massive pain point. This is a great starting point esp for large scale, thousands of parallel envs, battle tested coding / computer use agent training!

Junli Wang

@JunliWang2021

May 14

2,946

Hao Zhang

Zhoujun (Jorge) Cheng retweeted

Hao Zhang

@haozhangml

May 14

check our new work nanorollout!!

Junli Wang

@JunliWang2021

May 14

8,271

Shibo Hao

Zhoujun (Jorge) Cheng retweeted

Shibo Hao

@Ber18791531

May 15

Check out this cool project led by @JunliWang2021 @ChengZhoujun ! Solid new agent infra and impressive results on open source RL/SFT recipes.

Junli Wang

@JunliWang2021

May 14

1,236

Zihan "Zenus" Wang

Zhoujun (Jorge) Cheng retweeted

Zihan "Zenus" Wang

@wzenus

May 15

Very solid agentic infra work on accelerating agent rollout!

Junli Wang

@JunliWang2021

May 14

2,845

Bowen Wang

Zhoujun (Jorge) Cheng retweeted

Bowen Wang

@BowenWangNLP

May 15

Nice work! Training digital agents isn't trivial, co-designing rollouts with targeted environments stands as the pain point once you dig into agentic RL. This is super clean and agentic RL folks should try this out.

Junli Wang

@JunliWang2021

May 14

747

Guohao Li 🐫

Zhoujun (Jorge) Cheng retweeted

Guohao Li 🐫

@guohao_li

May 15

We need more agent rollouts. Glad to see SETA is used in NanoRollout!

Junli Wang

@JunliWang2021

May 14

3,309

Zhoujun (Jorge) Cheng

Zhoujun (Jorge) Cheng

@ChengZhoujun

May 14

Happy to release NanoRollout, our infra attempt to scale digital agent rollouts without pain. Setting up and scaling parallel digital agent envs is one of the biggest headaches in agent training / deployment. The open community hasn't handled it well. Two appealing features from NanoRollout: 🔌 Non-intrusive RL integration with frameworks such as miles, verl, tunix; validated end-to-end, e.g. outperforms DeepSWE-32B at a large 4k batch size 🚀 🧩 Unified support across agent harnesses and envs — covering SWE-Bench, Terminal-Bench, OSWorld, CocoaBench — with fast parallel eval that reproduces published scores (e.g., full SWE-Bench Verified eval from 102 min → 18 min, 5.7x faster⚡) And the core logic is just ~900 LOC. Hope NanoRollout helps if you're also trying to scale agent rollouts. Check out the tech blog and github for more details! Big thanks to the fantastic co-lead @JunliWang2021

Junli Wang

@JunliWang2021

May 14

2,923

Junli Wang

Zhoujun (Jorge) Cheng retweeted

Junli Wang

@JunliWang2021

May 14

NanoRollout: Scale digital agent rollouts without pain | Notion

Junli Wang* Zhoujun Cheng*† Yuxuan Zhang* Shibo Hao Yao Tang

cocoa-org.notion.site

136

35,493

Caiming Xiong

Zhoujun (Jorge) Cheng retweeted

Caiming Xiong

@CaimingXiong

May 13

Today, we’re excited to launch Recursive (@recursive_si): an exceptional team across London and San Francisco, building AI systems that can safely improve their own capabilities over time.

Recursive

@Recursive_SI

May 13

x.com/i/article/205444819329…

124

17,240

Zhoujun (Jorge) Cheng

Zhoujun (Jorge) Cheng

@ChengZhoujun

May 13

A great piece on self-distillation using failures! Besides scaling up num of rollouts, actively scaling (extracting) signals from raw rollouts should be an important way to improve agents and save compute.

Yuwei Zhang

@YuweiZh49446108

May 12

On-policy self-distillation is a promising direction for learning from rich textual feedback. But can it really learn from failed trajectories? Our answer: not quite -- unless we let the model actively interpret them. 🧵1/N

3,775

Apurva Gandhi

Zhoujun (Jorge) Cheng retweeted

Apurva Gandhi

@apurvasgandhi

May 8

Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer hard problems • Solve problems faster with parallel execution But how do we train a model to best take advantage of sub-agents and make sure we get these benefits? Very excited to release RAO: Recursive Agent Optimization. RAO is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves (that can themselves spawn other agents) - turning recursive inference into a learned capability. 1/10

117

717

134,874

Zhiting Hu

Zhoujun (Jorge) Cheng retweeted

Zhiting Hu

@ZhitingHu

May 4

🏆Honored to receive the Test of Time Award Honorable Mention #AISTATS2026 for our 2016 work Deep Kernel Learning, with the amazing @andrewgwils @rsalakhu @ericxing What a decade of AI progress! While GenAI is now driving massive real-world applications, the deepest underlying challenge remains: learning efficient representations of the world—for understanding, generation, predicting future worlds, and reasoning in the latent space. So much fun to think about for the next decade!⏳

120

14,557