Baris

Baris

1,851 Photos and videos

Tweets

Baris

@Baris

Jun 12

Physical AI is having its moment. Robotics, autonomous vehicles, interactive video. Underneath all of it sits the same primitive: a world model that predicts how an environment responds to action. There's a lot of debate about world models. JEPA vs. diffusion vs. autoregressive. How sim-to-real actually transfers. Nobody agrees on how to evaluate physical understanding. So @taiuti & @_bschmidtchen at @reactorworld , @ashray_malhotra at @Adobe, and I decided to put the top minds in the field in one room. On June 18 in SF, we're hosting World Models SF Summit, likely the first summit dedicated entirely to world models. Researchers, founders, and practitioners from @GoogleDeepMind @OpenAI @Meta @nvidia @theworldlabs @Tesla @Apple @Microsoft @Roblox and many strong startups in the space have already signed up. A few seats left: luma.com/hkik7xtr

World Models SF Summit · Luma

World Models SF Summit is a half-day roundtable bringing together researchers and builders working at the frontier of world models. We are co-hosting this…

luma.com

7,581

Baris

Baris

@Baris

Jun 11

Qwen-trained probe pricing difficulty for @MistralAI & Phi means difficulty lives in the latent representations... not the solver. Excited to see @VmaxAI ship 🔥 Congrats @MavorParker💪

Augustine Mavor-Parker

@MavorParker

Jun 10

Training a model to generate RL tasks not too hard, not too easy costs many solver runs per task. PROPEL predicts difficulty via a probe on its activations instead, amortizing cost and speeding up generator optimization. New open-ended RL research from @Vmax @GoodfireAI.

123

Baris

Baris

@Baris

Jun 3

Mars rover ride Every frame generated in real time on @reactorworld's longlive-v2 well done @taiuti @_bschmidtchen 👏 just getting started.

0:28

reactor

@reactorworld

Jun 3

Introducing LongLive 2.0 by @NVIDIA. Real-time text-to-video with full multi-shot control. Build your story shot by shot, scene by scene. Available now on Reactor.

0:26

1,261

Eliott Mogenet

Baris retweeted

Eliott Mogenet

@eliott__mogenet

Jun 3

The next era of animation won't be generated. It'll be explored. Yoki (予期) is my first film made with world models video models. The landscapes of Japan, rebuilt as worlds you move a camera through with @reactorworld. World models are the most underrated storytelling tech right now, and almost no one's using them for film. Animation made with @invideoOfficial.

1:56

114

10,320

Baris

Baris

@Baris

Jun 2

Jensen at @Computex: "Text data plus compute gives you AI. Now that we have AI, compute is data (in physical AI)" LLMs trained on a corpus that already existed... whole internet of text. Robots have none. Nobody recorded the world in first-person, action-labeled frames. (Hence so many startups building here.) So you generate it. World models like @nvidia 's Cosmos spend compute to produce physics-accurate, action-conditioned rollouts from any perspective, then close the loop as the policy's own simulator. So... data becomes a compute problem in physical AI. All those world models need to compute somewhere. That's @reactorworld ... and the biggest infrastructure opportunity in physical AI today 🔥 cc @taiuti @_bschmidtchen

183

Augustine Mavor-Parker

Baris retweeted

Augustine Mavor-Parker

@MavorParker

May 29

The unix terminal is the natural interface for agents to get work done on a computer but how well can agents actually use unix? Claude Code. Codex. Devin. Every frontier agent ships as a terminal tool. With unix-ctf, Vmax is using setters and solvers to measure Unix competence.

15,330

Baris

Baris

@Baris

May 29

Amazing event last night. @SarvamAI & @LightspeedIndia (@MohapatraHemant @dkhare ) have pulled together an impressive bench of researchers from @OpenAI @GoogleDeepMind @Meta , @stripe @MistralAI and others. @pratykumar walked us through building sovereign AI out of India, and expanding into the Bay Area to tap the talent that's actually trained large models at scale. World models came up again and again in my conversations with researchers from the frontier labs. That's going to be the biggest space in AI 🚀 cc @reactorworld @taiuti @_bschmidtchen

4,440

Hemant Mohapatra

Baris retweeted

Hemant Mohapatra

@MohapatraHemant

May 29

Beautiful evening in SF @ a wonderful housefull event w/ @pratykumar of @SarvamAI talking about the company's plans, hiring for an SF office, scaling out compute, building v. large models, and global GTM. Inspiring and genuinely one of the clearest visions in the world of AI.

293

20,730

Baris

Baris

@Baris

May 28

AI is moving from language models to world models. They need their own runtime. That's @reactorworld 🚀 LLM inference became a multi-$B category because batching, routing & GPU orchestration were too complex for app developers to own. World models will follow the same pattern... but technical bar is way higher. The workloads are stateful, temporal, heterogeneous. Video tokens. Latent states. Action loops. Streaming rollouts. Latency and compute profiles vary across models. Different category entirely. @taiuti @_bschmidtchen are building this from first principles. Both led technology on @Apple Vision Pro. @taiuti previously co-founded @LumaLabsAI as CTO. Privilege to be on this journey with them from the start 🙏 Congrats to @reactorworld team on the $59M Series A led by stellar investors @lightspeedvp (@buckymoore @theamberyang) @WndrCoLLC @Amplify @Sky9Capital @FPVventures @AbstractVC ... and @awscloud on board as preferred cloud partner 💪 variety.com/2026/digital/new… Biggest infrastructure opportunity in physical AI today. Let's go! 🔥

1:53

1,001

Baris

Baris

@Baris

May 27

Congrats to the @reactorworld (@taiuti @_bschmidtchen) & @emergentlabs (@mukundjha) teams on being named to @Redpoint annual list of the 100 most important infrastructure companies powering AI. Well earned 🔥 Proud to be an early investor in both!

Redpoint @Redpoint

May 27

The Redpoint InfraRed 100 is now live. These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on. Congratulations to this year's honorees! Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 redpoint.com/reports/the-inf…

697

Baris

Baris

@Baris

May 20

Every frontier lab is hitting the same wall: human-generated training data doesn't scale ✋ Most teams respond with bigger RLHF pipelines and more human annotators. @VmaxAI is taking a fundamentally different approach. Their new PopuLoRA paper introduces co-evolving populations of LoRA adapters performing asymmetric self-play on a frozen base model. They're doing evolutionary operations directly in weight space Teams working on self-improvement without human data dependency will matter a lot. Talk to @MavorParker to learn more. They are hiring too!

Augustine Mavor-Parker

@MavorParker

May 20

Vmax is building an open-ended learning system that generates and optimizes itself on tasks that it creates, avoiding human bias that may corrupt optimal learning curricula. In PopuLoRA, we instantiate this as co-evolving populations of LLMs performing asymmetric self-play.

442

Baris

Baris

@Baris

May 7

GPUs melting 🫠

Alberto

@taiuti

May 7

We're taking our first step towards democratizing World Models, so that everyone can build on this incredible technology. We have more to share, but enjoy a glimpse of what's to come, today. Try it here: reactor.inc

0:23

231

Bryce Schmidtchen

Baris retweeted

Bryce Schmidtchen

@_bschmidtchen

May 7

Real-time World Models are the next AI frontier. Today, we @reactorworld are taking the first step towards this reality: our early preview lets you experience worlds generated in real-time, running on our global low-latency infrastructure. Try it now: reactor.inc/

0:34

413

428

7,657

8,487,451

reactor

Baris retweeted

reactor

@reactorworld

May 7

Real-time World Models are the next AI frontier. Today, we're taking the first step towards this reality: our early preview lets you experience worlds generated in real-time, running on our global low-latency infrastructure. Try it now: reactor.inc/

0:34

1,072

10,294,591

Baris

Baris

@Baris

May 7

The missing layer in the physical AI stack is starting to emerge: reactor.inc/ LLMs became useful to developers only after the serving layer matured. Inference providers abstracted away GPU orchestration, batching, routing, scaling, and model-specific deployment. This led to several multi-$B startups 💰 World models need their own version of that layer. But this is not just “LLM inference with video” 🚫 World models are stateful, temporal, and heterogeneous. They involve video tokens, latent states, action loops, rollouts, streaming generation, multimodal inputs, and very different latency/compute profiles across models. The next generation of physical AI experiences will need a runtime that can orchestrate these models, preserve state, manage latency, hide backend complexity, and expose a clean developer surface. The foundation models for physical AI are being built. Now the experience layer needs its execution engine. @reactorworld is starting to reveal that layer today... Check out the demo here 👀 @taiuti @_bschmidtchen

Reactor - Developer platform for real-time generative media

Reactor is the developer infrastructure for real-time generative media. Fastest inference anywhere, sub-50ms streaming, every major world model on one API.

reactor.inc

109

Baris

Baris

@Baris

May 6

In my last post I covered the compute and data requirements of world models. So.. what are world models? There are at least four architectural paths emerging as far as I understand. Some predict future observations. Some build explorable spaces. Some learn compact latent dynamics. Some skip explicit world modeling and map perception directly to action.

161

Baris

Baris

@Baris

May 6

1️⃣ Generative video world models predict observable futures. They generate what the world will look like in response to actions. Generative models bet that pixel-level fidelity carries causal signal. Subtle details like shadows and surface textures change agent decision-making. The trade-off is cost. Generating pixels is expensive. Examples: @NVIDIA Cosmos, @GoogleDeepMind Genie 3, @DecartAI, Diamond, @gen_intuition, @wayve_ai /GAIA 2️⃣ 3D spatial models take a different approach. Instead of generating video frames, they construct persistent, navigable 3D environments. Their bet is that persistent geometry is the right substrate for spatial reasoning. If you have a persistent 3D representation, spatial relations do not have to be relearned implicitly from video every time. Examples; @theworldlabs, @odysseyml

114

Baris

Baris

@Baris

May 6

3️⃣ Latent world models predict what the world will mean rather than what it will look like. They compress away unpredictable visual variation without losing causal structure. Instead of generating the next frame, they forecast abstract representations of future states. They predict in representation space rather than pixel space. The advantage is compute efficiency and natural fit for agent training. The risk is compression loss. If the latent encoding misses a causally important detail, the agent cannot learn from what it cannot represent. Examples: JEPA by @Meta , @ylecun's @amilabs 4️⃣ VLAs (Vision-Language-Action models) take the pragmatic robot-policy path. They skip building a world model from scratch. The bet is that pretrained vision-language representations already encode enough physical understanding that you can skip building a world model entirely and go straight to action. Examples: @physical_int, @SkildAI, @GoogleDeepMind Robotics, @nvidia GR00T, @Figure Helix, OpenVLA It feels like world models are where LLMs were in 2019-21 window. Multiple architectures, no clear winner, and the training data problem is not yet solved. The difference is that text was already on the internet. Action-conditioned physical data is not. This may be the bottleneck that decides which architecture wins. What do you think?

106

Baris

Baris

@Baris

May 4

World models might need more compute than LLMs.. and LLMs already triggered one of the largest infrastructure buildout in history 🏭 LLMs learn the structure of language. World models learn the structure of causality: how objects collide, how fluids flow, how crowds behave, how a scene changes after an action. This changes the compute math: Text is sparse. Video is dense, temporal, redundant, and expensive to curate. A book can fit in hundreds of kilobytes. A minute of useful video can be hundreds of megabytes before you even ask whether it contains the right objects, actions, camera angles, contacts, failures, and edge cases. The trick is compression. World models do not predict every pixel in a 4K frame. They compress video into latent representations and learn to predict how those latent states evolve under perturbations: actions, camera motion, text prompts, or environmental change. Even with this compression, the compute requirements are staggering. NVIDIA’s Cosmos world-model trained on 20M hrs of video with 10,000 H100 GPUs for roughly three months (10,000×90×24≈21.6M GPU-hours). At GPU-hour rates, that is ~ $100M of training compute. Owning comparable capacity outright would be a $400M capex decision 💸 But compute is not the only constraint... Data may be the harder one. The internet gave LLMs massive corpus of text. Robotics has no equivalent internet-scale corpus of action-conditioned experience. Passive video is not enough. That data has to be generated through teleoperation, real robots, simulation, synthetic worlds, or some combination of all four. And this is just training. Inference may be an even bigger bottleneck. A robot does not just answer a prompt. It has to plan, simulate possible futures, handle uncertainty, and act safely in real time. World models shift part of that burden into learned representations. Instead of hand-coding every corner of physics, world models amortize some of that complexity into a neural network. The "stochastic messiness of reality"* gets baked into the weights. That is the real shift... LLMs taught machines to read the internet. World models will teach machines to operate in reality. The compute buildout for that may be much larger than people expect. * I read this somewhere and liked how it captured the challenges of the real world.

145

Baris

Baris

@Baris

Mar 12

Exciting! This is the rocketship to join 🚀

Alberto

@taiuti

Mar 12

Big update: I’m starting a new company. 6 months ago, @_bschmidtchen and I made a bet. What if entire worlds could be generated on the fly, pixel by pixel? World models are the next platform shift, and we saw it coming. Since then, we’ve: - secured major contracts across media and physical AI industries - assembled a team of 10 from Apple, Meta, Google, Adobe & Microsoft - raised from top-tier investors More details soon. We’re scaling fast, and hiring now. Come build with us: reactor.inc

0:09

537