Basis

Joined July 2022
4 Photos and videos
Pinned Tweet
31 Oct 2025
New paper from Basis' Project MARA team and collabs. The ability to learn and use world models is a key aspect of human intelligence, but evaluating this ability remains elusive. In this work we propose WorldTest, a representation-agnostic, behavior-based agent eval framework.
1
11
21
3,881
Jan 13
We're attending and sponsoring #POPL2026 in Rennes, France šŸ‡«šŸ‡· -- if you're around, stop by our sponsor booth to chat about research and open opportunities at Basis. We'll be there Wednesday 10:00-19:30 and Friday 10:00-18:00.
2
6
528
Jan 13
We're hiring research scientists in PL and other areas. Join us! basis.ai/join-us/#careers
2
406
Basis retweeted
New preprint on learning abstract world models for robotics planning. Paper code below. šŸ¤–šŸŒ Must an agent plan by simulating pixels frame by frame, or can it think in abstractions? Consider planning an international flight: we can reason about buying tickets, changing airplanes, and crossing borders without committing to the color of the airplane or the milliseconds before takeoff. Absent abstraction, planning over long time horizons would be intractable, because every minute detail of the world would need to be simulated. [1/7]
2
11
24
4,526
Basis retweeted
🚨 MIT and Basis Research just dropped a new way to measure if AI actually understands the world and the results are brutal. It’s called "WorldTest", and it doesn’t just check how well an AI predicts the next frame or maximizes reward. It checks whether the model can build an internal model of reality and use it to handle new situations. They built 'AutumnBench', a suite of 43 interactive worlds and 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. top AI models Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild... current AIs don’t understand environments; they pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. Paper: Benchmarking World-Model Learning (arxiv. org/abs/2510.19788)
56
210
921
109,947
Basis retweeted
30 Oct 2025
like i have been saying since 2019, world models are the next key step.
šŸ”„ MIT just exposed every top AI model and it’s not pretty. They built a new test called WorldTest to see if AI actually understands the world… and the results are brutal. It doesn’t just check how well a model predicts the next frame or maximizes reward it tests whether it can build an internal model of reality and use it to handle new situations. They built AutumnBench 43 interactive worlds, 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild.. today’s AIs don’t understand environments; they just pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. (Comment ā€œSendā€ I’ll DM you the paper)
10
29
215
32,587
Basis retweeted
30 Oct 2025
"Today’s AIs don’t understand environments; they just pattern-match inside them." Literally what critics have been saying for years now.
šŸ”„ MIT just exposed every top AI model and it’s not pretty. They built a new test called WorldTest to see if AI actually understands the world… and the results are brutal. It doesn’t just check how well a model predicts the next frame or maximizes reward it tests whether it can build an internal model of reality and use it to handle new situations. They built AutumnBench 43 interactive worlds, 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild.. today’s AIs don’t understand environments; they just pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. (Comment ā€œSendā€ I’ll DM you the paper)
37
1,072
6,804
151,602
31 Oct 2025
New paper from Basis' Project MARA team and collabs. The ability to learn and use world models is a key aspect of human intelligence, but evaluating this ability remains elusive. In this work we propose WorldTest, a representation-agnostic, behavior-based agent eval framework.
1
11
21
3,881
31 Oct 2025
Our Project MARA team who led this work is looking for research scientists to join us! Link to apply below.
šŸ”„ MIT just exposed every top AI model and it’s not pretty. They built a new test called WorldTest to see if AI actually understands the world… and the results are brutal. It doesn’t just check how well a model predicts the next frame or maximizes reward it tests whether it can build an internal model of reality and use it to handle new situations. They built AutumnBench 43 interactive worlds, 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild.. today’s AIs don’t understand environments; they just pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. (Comment ā€œSendā€ I’ll DM you the paper)
3
2
295
31 Oct 2025
We're also looking for roboticists: jobs.ashbyhq.com/basis-resea…

115