Cofounder @theworldlabs. Building Spatial Intelligence.

Joined January 2014
43 Photos and videos
Pinned Tweet
Today we are launching Marble – a multimodal world model that lets you create and edit 3D worlds.
Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at marble.worldlabs.ai
8
8
98
14,524
3D is an exciting area where we are still figuring out the right tasks, problem formulations, architectures, and the best ways to scale. We're sharing some of our ideas here in our first-ever papers from @theworldlabs, led by an awesome set of interns.
Today we are sharing three new research papers, each exploring a new way to generate 3D content by leveraging large-scale generative models and 2D priors. These projects were led by our incredible interns @HaoZhang623 @BDuisterhof @DrTunnels [1/4]
5
15
134
25,857
Working with @HaoZhang623 , @BDuisterhof , and @DrTunnels on these projects over the past few months has been great - congrats to all of you!
1
5
1,138
Justin Johnson retweeted

161
950
4,561
980,964
Justin Johnson retweeted
Today we released “GPIC: A Giant Permissive Image Corpus for Visual Generation.” It’s a 100M image dataset for visual generation, with text captions and 100% known permissive licenses, hosted on HuggingFace. I’m excited to get this out! Check it out: gpic.stanford.edu/
3
15
77
6,333
GPIC should be the new standard benchmark for generative modeling. Training 1 epoch on GPIC is the same cost as 100 epochs on ImageNet, but is a much better proxy for real-world problems. If you work in generative modeling, try GPIC for your next project!
1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI
9
17
103
45,504
In recent years, academic and industry work in generative modeling has drifted so far apart that they are playing totally different games, and techniques that work in academia may not transfer to industry problems. The divide isn't just about scale -- the different tasks in academia vs industry lead to different fundamental challenges. Academic work focuses on class-conditional ImageNet generation. This has a very weak conditioning signal (single categorical label) and the problem is very data-constrained, with all SOTA methods training for hundreds of epochs. The main challenge in this regime is combatting overfitting. Industry work on image or video generation usually has a much richer conditioning signal (e.g. very long captions, input images, etc) and is almost always underfitting since data can be scaled to absurd degrees. Overfitting (at least for pretraining) isn't a concern; instead we want to fit the complex data distribution *as fast as possible*. We hope that GPIC is approachable on the academic budgets people are already expending on ImageNet, but will lead to problems more similar to the industry-scale challenges in generative modeling.
4
22
2,433
Latent Forcing lets us train strong pixel-space diffusion models that benefit from DINOv2 alignment like many recent latent diffusion models. But it's a lot more than that -- it's also a new way to approach coarse-to-fine generation and latent "chain-of-thought" reasoning for diffusion models. Super proud of this project led by @BaadeAlan!
What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei ... 1/n
1
16
140
13,523
Justin Johnson retweeted
RTFM (Real-Time Frame Model) explores real-time, interactive frame generation from images, producing new views on the fly without explicit 3D. It differs from Marble, which targets high-fidelity persistent worlds (3DGS). We’re interested in how ideas from both might connect 🌎
19
54
491
74,435
Creating 3D worlds is now just an API call away with the launch of our new World API -- I can't wait to see what you build with it.
The World API is live. Generate persistent, explorable 3D worlds from text, images, and video. Integrate them directly into your products.
1
3
67
7,728
Justin Johnson retweeted
Today, we’re sharing a behind-the-scenes look at how we used Marble to create the worlds you see in our launch video.
24
75
373
129,605
Justin Johnson retweeted
13 Nov 2025
Wait— you can make a whole music video (Spaghetti 🍝from @le_sserafim ) on Marble and have fun doing it? I just created 20 Spaghetti Worlds, all 100% Marble-made. Can you guess their flavors? Marble from @theworldlabs is the first AI that actually lets me build these wild worlds, explore them freely, and craft smooth camera paths— all while vibing to music in real time. It’s insanely fun. I’m dropping the link to my Spaghetti Worlds and recipe in the reply. And I’m issuing a challenge to everyone: Think you can beat my music video? You’ll have to out-cook me with your worlds and your camera moves. 🍝🎥🔥 Sound on 🔊
1
8
14,299
Justin Johnson retweeted
13 Nov 2025
we've been having so much fun testing our own tech 😆 I was able to create this world from just 2 input images in 10 minutes (and spent another 20 minutes making the flythrough video). workflow description below ⬇️
4
21
139
45,636
Today we are launching Marble – a multimodal world model that lets you create and edit 3D worlds.
Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at marble.worldlabs.ai
8
8
98
14,524
I’ve been having so much fun making worlds with Marble, and can’t wait to see all the creative things everyone builds with it!
5
960
"Tensor Parallel" is a terrible name ... all parallelization strategies parallelize tensors somehow. It would be better to call it "channel parallel" = CP, but that conflicts with "context parallel". Maybe "feature parallel" = FP?
21
1
153
85,199
Justin Johnson retweeted
21 Oct 2025
real-time WebAR experience 5MB gaussian splat. Generated this scene from a photo using World Labs.
22
69
781
39,401
Justin Johnson retweeted
Today we announced RTFM (Real-Time Frame Model) — a world model that generates frames in real time from any camera viewpoint. Unlike standard video models, RTFM understands 3D geometry. You can literally move the camera through the generated world. 🎥👇
11
23
233
15,740
Justin Johnson retweeted
16 Oct 2025
We're hiring!! Come build more frame models with us!🖼️🎮⚡️👇 jobs.ashbyhq.com/worldlabs

Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!
1
1
25
7,099
We are sharing a research preview of our latest model from @theworldlabs! RTFM is an autoregressive diffusion transformer trained on large-scale video data. It generates video frames in real-time without building an explicit 3D model of the world. Try the demo today!
Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!
7
14
238
25,656
Are you an expert in diffusion models? Want to work with an amazing team building the next generation of world models @theworldlabs ? We are hiring!
3 Oct 2025
We're hiring research scientists in diffusion / generative models at @theworldlabs! Check out the role here: jobs.ashbyhq.com/worldlabs Feel free to DM me if you would like to chat!👋
4
18
175
25,624