Open Organization to Build AI-powered Realities with LLMs, World Models, Agent Models.

Joined March 2024
28 Photos and videos
Pinned Tweet
🤯 coding agents entering the physical worlds
Claude Code can now build things in a simulated physical world!🤖🏙️ With SimWorld, coding agents can construct buildings, plan cities, or even create video games inside a realistic simulation on Unreal Engine. Just write a prompt, your agent will call tools, retrieve assets, plan scenes, and test physics autonomously. Demo platform coming soon so everyone can try it. Stay tuned. 🚀
1
169
Maitrix.org retweeted
This is one of the coolest things I’ve seen with @claudeai Fable 5. It’s been 1 day 🤯 Impressive👏👏
Update: also tried the same prompt with Fable 5 First impression: visually the strongest result so far --- denser village, better terrain/building placement, and nice lived-in details. Still not perfect though: close-ups show both good details and failure cases, including floating buildings 😅 (check out the threads👇)
3
8
657
Maitrix.org retweeted
Close-ups from Fable 5: Good progress on global aesthetics and scene details: more coherent village layout, rooftops, balconies, water tanks, AC units, stalls, and small props that make it feel lived-in. But still not solved: some buildings are floating above the sand or not properly grounded to the terrain 😅
2
2
16
2,111
Maitrix.org retweeted
Update: also tried the same prompt with Fable 5 First impression: visually the strongest result so far --- denser village, better terrain/building placement, and nice lived-in details. Still not perfect though: close-ups show both good details and failure cases, including floating buildings 😅 (check out the threads👇)
We gave 4 frontier coding agents the same hard environment-gen test: Preserve the desert landscape. Surgically remove only the ruins. Then build a dense, believable Middle Eastern village from scratch using rustic assets. Which model did the best job? 🏜️👇 Poll and full prompt in thread.
5
14
185
392,324
Maitrix.org retweeted
Here’s the zoom-in video😂 The desert vibes are real, but so are the anti-gravity palm trees 🌴
1
3
11
501
Maitrix.org retweeted
Haha, great catch🤣 Opus 4.8 apparently decided palm trees are exempt from physics. At first glance the scene looks great, but the zoom-ins tell a different story: floating trees, upside-down trunks, and palms growing straight out of buildings 🌴
Replying to @simworld_ai
People likely think that Opus version looks better due to the palm trees... yet almost all of these trees are completely bugged. Look how many of them are just floating upside down in mid air or growing out of buildings....
2
6
11
2,369
Maitrix.org retweeted
We gave 4 frontier coding agents the same hard environment-gen test: Preserve the desert landscape. Surgically remove only the ruins. Then build a dense, believable Middle Eastern village from scratch using rustic assets. Which model did the best job? 🏜️👇 Poll and full prompt in thread.
9
17
157
70,256
Maitrix.org retweeted
I’m not sure Gemini 3 looks that much more impressive here.🤔 For example, why is there a giant White House–like building just sitting in the middle of the street? This feels like a real example of how even frontier coding agents can still struggle with spatial reasoning.
Missed Gemini 3 yesterday, but catching up now This genuinely looks impressive!
1
4
13
1,898
Maitrix.org retweeted
Missed Gemini 3 yesterday, but catching up now This genuinely looks impressive!
We asked 4 frontier coding agents to build the same Unreal 3D city scene in SimWorld Studio. Same prompt. Different worlds 👀 Claude Code Opus 4.7 Codex GPT-5.5 Cursor Composer 2.5 OpenCode Gemini 2.5 Pro Who wins?
2
5
18
4,622
Maitrix.org retweeted
We asked 4 frontier coding agents to build the same Unreal 3D city scene in SimWorld Studio. Same prompt. Different worlds 👀 Claude Code Opus 4.7 Codex GPT-5.5 Cursor Composer 2.5 OpenCode Gemini 2.5 Pro Who wins?
18
13
145
34,516
Environment generation is the new AI frontier
Environment generation is the missing scaling axis for embodied AI. Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D env where agents act, fail & learn. Env-agent co-evolvution improves navigation success 50% → 90%. From a prompt, our SimCoder writes code to automatically build an interactive world. Agents train inside it. And their performance shapes the next world.
2
3
738
Maitrix.org retweeted
Environment generation is the missing scaling axis for embodied AI. Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D env where agents act, fail & learn. Env-agent co-evolvution improves navigation success 50% → 90%. From a prompt, our SimCoder writes code to automatically build an interactive world. Agents train inside it. And their performance shapes the next world.
8
37
220
2,264,248
Maitrix.org retweeted
Natural language is human-created representation of the world. Is the ultimate form of the bitter lesson to bypass natural language entirely and learn a new representation from the world itself?
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
2
2
13
2,218
Maitrix.org retweeted
Happy to release NanoRollout, our infra attempt to scale digital agent rollouts without pain. Setting up and scaling parallel digital agent envs is one of the biggest headaches in agent training / deployment. The open community hasn't handled it well. Two appealing features from NanoRollout: 🔌 Non-intrusive RL integration with frameworks such as miles, verl, tunix; validated end-to-end, e.g. outperforms DeepSWE-32B at a large 4k batch size 🚀 🧩 Unified support across agent harnesses and envs — covering SWE-Bench, Terminal-Bench, OSWorld, CocoaBench — with fast parallel eval that reproduces published scores (e.g., full SWE-Bench Verified eval from 102 min → 18 min, 5.7x faster⚡) And the core logic is just ~900 LOC. Hope NanoRollout helps if you're also trying to scale agent rollouts. Check out the tech blog and github for more details! Big thanks to the fantastic co-lead @JunliWang2021
Digital agent learning needs massive rollouts. But digital agent rollouts are painfully slow due to heavy environments. 🐌 🚀 We introduce NanoRollout, a lightweight open infra (900 lines core code) for digital agent rollout at scale, validated with three workloads: 🏋️ Large batchsize (4K) SWE Agent RL -> surpasses DeepSWE-32B 🧪 250k distilled coding trajectories -> SOTA ≤32B open coding agent ⚡ Fast evaluation on coding/cua/unified agent -> finish Check our Blog: cocoa-org.notion.site/nanoro…
7
19
2,923
Maitrix.org retweeted
🏆Honored to receive the Test of Time Award Honorable Mention #AISTATS2026 for our 2016 work Deep Kernel Learning, with the amazing @andrewgwils @rsalakhu @ericxing What a decade of AI progress! While GenAI is now driving massive real-world applications, the deepest underlying challenge remains: learning efficient representations of the world—for understanding, generation, predicting future worlds, and reasoning in the latent space. So much fun to think about for the next decade!⏳
10
14
120
14,558
Maitrix.org retweeted
Come check out LaDiR — our ICLR paper about latent diffusion for text reasoning. Instead of reasoning one token at a time in text space, LaDiR moves reasoning into continuous latent space and uses diffusion over blocks of thought tokens. That means LLMs can: -rethink whole reasoning paths -explore multiple solutions -and plan more flexibly We show these gains on math, code and planning tasks.
I’ll be at ICLR 2026 in Rio 🇧🇷 presenting our work: LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning 🗓️ Fri, Apr 24, 10:30 AM - 1:00 PM 📍 Poster Session 3, Pavilion 4, #4916 This work explores new a direction in latent reasoning with diffusion for advancing LLM capabilities. I’d be happy to connect—feel free to stop by the poster or reach out for a coffee chat ☕
2
5
18
3,113
Maitrix.org retweeted
Come and check out our ICLR work: Speculative Verdict (SV) for information-intensive visual reasoning. Inspired by speculative decoding, instead of drafting tokens, SV asks multiple small VLMs to draft diverse reasoning and localization paths, then uses a stronger model to produce the final verdict. The key insight is simple: no single reasoning path has to be perfect. Even when each path is only partly correct, combining the right pieces can still recover the correct answer — giving both better accuracy and lower cost.
Heading to #ICLR2026 🇧🇷! I'll be presenting Speculative Verdict at the poster session on Apr 25, 10:30 AM–1:00 PM, Pavilion 4 #3507, happy to chat! 📄 Paper: arxiv.org/abs/2510.20812 💻 Code: github.com/Tinaliu0123/specu…
1
5
17
2,800
Maitrix.org retweeted
🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench
16 Dec 2025
🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇
2
34
79
11,677
Maitrix.org retweeted
That’s wild — and smart! 🤣 SimWorld coding agent self-improves by autonomously creating new tools and skills It realized BaGuaZhen(八卦阵) was too hard to build directly, so it created its own tools and skills. Starting from only primitive operations like spawn_actor() and delete_actor(), the agent does not just brute-force the task. It breaks the problem down and builds higher-level capabilities for itself.
A SimWorld coding agent can now create its own tools and skills on the fly. We challenged it with BaGuaZhen (八卦阵 Eight Trigrams), an ancient Chinese formation that is difficult to build from scratch because of its precise spatial structure and multi-step coordination. Instead of failing with brute force, the agent wrote reusable components for itself: Tools: Bagua Wall Segment, Bagua Trigram Line Skills: Bagua Wall Segment Skill, Bagua Trigram Line Skill Each tool is paired with a skill that teaches the model how to use it. Without skills: it fails. With self-built skills: it organizes the full structure. The exciting shift is this: agents are starting to generate capabilities, not just outputs.
1
12
48
7,646
Maitrix.org retweeted
A SimWorld coding agent can now create its own tools and skills on the fly. We challenged it with BaGuaZhen (八卦阵 Eight Trigrams), an ancient Chinese formation that is difficult to build from scratch because of its precise spatial structure and multi-step coordination. Instead of failing with brute force, the agent wrote reusable components for itself: Tools: Bagua Wall Segment, Bagua Trigram Line Skills: Bagua Wall Segment Skill, Bagua Trigram Line Skill Each tool is paired with a skill that teaches the model how to use it. Without skills: it fails. With self-built skills: it organizes the full structure. The exciting shift is this: agents are starting to generate capabilities, not just outputs.
1
8
34
14,481
Maitrix.org retweeted
It’s fun to watch a coding agent reason through spatial construction, iterating through trying, failing, revising, and trying again. Really promising, though still a long way to go. It reminds me of a kid playing with LEGO for the first time, gradually turning trial and error into something creative, like a piece of art. Try SimWorld Studio to build your own physical world.
🌊🏝️🌉Coding agent performing spatial reasoning to construct complex scenes Powered by SimWorld Studio (link in the thread)
15
50
6,559