Maitrix.org

Maitrix.org

28 Photos and videos

Tweets

Pinned Tweet

Maitrix.org

@MaitrixOrg

Mar 4

🤯 coding agents entering the physical worlds

SimWorld

@simworld_ai

Mar 4

Claude Code can now build things in a simulated physical world!🤖🏙️ With SimWorld, coding agents can construct buildings, plan cities, or even create video games inside a realistic simulation on Unreal Engine. Just write a prompt, your agent will call tools, retrieve assets, plan scenes, and test physics autonomously. Demo platform coming soon so everyone can try it. Stay tuned. 🚀

0:28

169

Based Beefy 🐂 🟦

Maitrix.org retweeted

Based Beefy 🐂 🟦@BeefytheBull

Jun 10

This is one of the coolest things I’ve seen with @claudeai Fable 5. It’s been 1 day 🤯 Impressive👏👏

SimWorld

@simworld_ai

Jun 9

Update: also tried the same prompt with Fable 5 First impression: visually the strongest result so far --- denser village, better terrain/building placement, and nice lived-in details. Still not perfect though: close-ups show both good details and failure cases, including floating buildings 😅 (check out the threads👇)

0:24

657

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

Jun 9

Close-ups from Fable 5: Good progress on global aesthetics and scene details: more coherent village layout, rooftops, balconies, water tanks, AC units, stalls, and small props that make it feel lived-in. But still not solved: some buildings are floating above the sand or not properly grounded to the terrain 😅

2,111

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

Jun 9

0:24

SimWorld

@simworld_ai

Jun 8

We gave 4 frontier coding agents the same hard environment-gen test: Preserve the desert landscape. Surgically remove only the ruins. Then build a dense, believable Middle Eastern village from scratch using rustic assets. Which model did the best job? 🏜️👇 Poll and full prompt in thread.

0:37

185

392,324

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

Jun 9

Here’s the zoom-in video😂 The desert vibes are real, but so are the anti-gravity palm trees 🌴

0:24

501

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

Jun 9

Haha, great catch🤣 Opus 4.8 apparently decided palm trees are exempt from physics. At first glance the scene looks great, but the zoom-ins tell a different story: floating trees, upside-down trunks, and palms growing straight out of buildings 🌴

ImNotTheWolf

@ImNotTheWolf

Jun 9

Replying to @simworld_ai

People likely think that Opus version looks better due to the palm trees... yet almost all of these trees are completely bugged. Look how many of them are just floating upside down in mid air or growing out of buildings....

2,369

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

Jun 8

0:37

157

70,256

Lianhui Qin

Maitrix.org retweeted

Lianhui Qin

@Lianhuiq

May 29

I’m not sure Gemini 3 looks that much more impressive here.🤔 For example, why is there a giant White House–like building just sitting in the middle of the street? This feels like a real example of how even frontier coding agents can still struggle with spatial reasoning.

SimWorld

@simworld_ai

May 29

Missed Gemini 3 yesterday, but catching up now This genuinely looks impressive!

0:12

1,898

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

May 29

Missed Gemini 3 yesterday, but catching up now This genuinely looks impressive!

0:12

SimWorld

@simworld_ai

May 28

We asked 4 frontier coding agents to build the same Unreal 3D city scene in SimWorld Studio. Same prompt. Different worlds 👀 Claude Code Opus 4.7 Codex GPT-5.5 Cursor Composer 2.5 OpenCode Gemini 2.5 Pro Who wins?

0:12

4,622

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

May 28

0:12

145

34,516

Maitrix.org

Maitrix.org

@MaitrixOrg

May 20

Environment generation is the new AI frontier

SimWorld

@simworld_ai

May 20

Environment generation is the missing scaling axis for embodied AI. Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D env where agents act, fail & learn. Env-agent co-evolvution improves navigation success 50% → 90%. From a prompt, our SimCoder writes code to automatically build an interactive world. Agents train inside it. And their performance shapes the next world.

1:36

738

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

May 20

1:36

220

2,264,248

Zhiting Hu

Maitrix.org retweeted

Zhiting Hu

@ZhitingHu

May 18

Natural language is human-created representation of the world. Is the ultimate form of the bitter lesson to bypass natural language entirely and learn a new representation from the world itself?

Richard Sutton

@RichardSSutton

May 18

The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.

2,218

Zhoujun (Jorge) Cheng

Maitrix.org retweeted

Zhoujun (Jorge) Cheng

@ChengZhoujun

May 14

Happy to release NanoRollout, our infra attempt to scale digital agent rollouts without pain. Setting up and scaling parallel digital agent envs is one of the biggest headaches in agent training / deployment. The open community hasn't handled it well. Two appealing features from NanoRollout: 🔌 Non-intrusive RL integration with frameworks such as miles, verl, tunix; validated end-to-end, e.g. outperforms DeepSWE-32B at a large 4k batch size 🚀 🧩 Unified support across agent harnesses and envs — covering SWE-Bench, Terminal-Bench, OSWorld, CocoaBench — with fast parallel eval that reproduces published scores (e.g., full SWE-Bench Verified eval from 102 min → 18 min, 5.7x faster⚡) And the core logic is just ~900 LOC. Hope NanoRollout helps if you're also trying to scale agent rollouts. Check out the tech blog and github for more details! Big thanks to the fantastic co-lead @JunliWang2021

Junli Wang

@JunliWang2021

May 14

Digital agent learning needs massive rollouts. But digital agent rollouts are painfully slow due to heavy environments. 🐌 🚀 We introduce NanoRollout, a lightweight open infra (900 lines core code) for digital agent rollout at scale, validated with three workloads: 🏋️ Large batchsize (4K) SWE Agent RL -> surpasses DeepSWE-32B 🧪 250k distilled coding trajectories -> SOTA ≤32B open coding agent ⚡ Fast evaluation on coding/cua/unified agent -> finish Check our Blog: cocoa-org.notion.site/nanoro…

2,923

Zhiting Hu

Maitrix.org retweeted

Zhiting Hu

@ZhitingHu

May 4

🏆Honored to receive the Test of Time Award Honorable Mention #AISTATS2026 for our 2016 work Deep Kernel Learning, with the amazing @andrewgwils @rsalakhu @ericxing What a decade of AI progress! While GenAI is now driving massive real-world applications, the deepest underlying challenge remains: learning efficient representations of the world—for understanding, generation, predicting future worlds, and reasoning in the latent space. So much fun to think about for the next decade!⏳

120

14,558

Lianhui Qin

Maitrix.org retweeted

Lianhui Qin

@Lianhuiq

Apr 22

Come check out LaDiR — our ICLR paper about latent diffusion for text reasoning. Instead of reasoning one token at a time in text space, LaDiR moves reasoning into continuous latent space and uses diffusion over blocks of thought tokens. That means LLMs can: -rethink whole reasoning paths -explore multiple solutions -and plan more flexibly We show these gains on math, code and planning tasks.

Murray Kang @haoqik322

Apr 22

I’ll be at ICLR 2026 in Rio 🇧🇷 presenting our work: LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning 🗓️ Fri, Apr 24, 10:30 AM - 1:00 PM 📍 Poster Session 3, Pavilion 4, #4916 This work explores new a direction in latent reasoning with diffusion for advancing LLM capabilities. I’d be happy to connect—feel free to stop by the poster or reach out for a coffee chat ☕

3,113

Lianhui Qin

Maitrix.org retweeted

Lianhui Qin

@Lianhuiq

Apr 22

Come and check out our ICLR work: Speculative Verdict (SV) for information-intensive visual reasoning. Inspired by speculative decoding, instead of drafting tokens, SV asks multiple small VLMs to draft diverse reasoning and localization paths, then uses a stronger model to produce the final verdict. The key insight is simple: no single reasoning path has to be perfect. Even when each path is only partly correct, combining the right pieces can still recover the correct answer — giving both better accuracy and lower cost.

Yuhan (Tina) Liu

@l_yuhan7272

Apr 22

Heading to #ICLR2026 🇧🇷! I'll be presenting Speculative Verdict at the poster session on Apr 25, 10:30 AM–1:00 PM, Pavilion 4 #3507, happy to chat! 📄 Paper: arxiv.org/abs/2510.20812 💻 Code: github.com/Tinaliu0123/specu…

2,800

Shibo Hao

Maitrix.org retweeted

Shibo Hao

@Ber18791531

Apr 13

🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench

Shibo Hao

@Ber18791531

16 Dec 2025

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

11,677

Lianhui Qin

Maitrix.org retweeted

Lianhui Qin

@Lianhuiq

Apr 1

That’s wild — and smart! 🤣 SimWorld coding agent self-improves by autonomously creating new tools and skills It realized BaGuaZhen(八卦阵) was too hard to build directly, so it created its own tools and skills. Starting from only primitive operations like spawn_actor() and delete_actor(), the agent does not just brute-force the task. It breaks the problem down and builds higher-level capabilities for itself.

SimWorld

@simworld_ai

Apr 1

A SimWorld coding agent can now create its own tools and skills on the fly. We challenged it with BaGuaZhen (八卦阵 Eight Trigrams), an ancient Chinese formation that is difficult to build from scratch because of its precise spatial structure and multi-step coordination. Instead of failing with brute force, the agent wrote reusable components for itself: Tools: Bagua Wall Segment, Bagua Trigram Line Skills: Bagua Wall Segment Skill, Bagua Trigram Line Skill Each tool is paired with a skill that teaches the model how to use it. Without skills: it fails. With self-built skills: it organizes the full structure. The exciting shift is this: agents are starting to generate capabilities, not just outputs.

0:59

7,646

SimWorld

Maitrix.org retweeted

SimWorld

@simworld_ai

Apr 1

0:59

14,481

Lianhui Qin

Maitrix.org retweeted

Lianhui Qin

@Lianhuiq

Mar 23

It’s fun to watch a coding agent reason through spatial construction, iterating through trying, failing, revising, and trying again. Really promising, though still a long way to go. It reminds me of a kid playing with LEGO for the first time, gradually turning trial and error into something creative, like a piece of art. Try SimWorld Studio to build your own physical world.

0:44

SimWorld

@simworld_ai

Mar 23

🌊🏝️🌉Coding agent performing spatial reasoning to construct complex scenes Powered by SimWorld Studio (link in the thread)

0:44

6,559