Can someone explain why this is a good idea because I'm very unconvinced. The text-equivalent of a world simulator is a base model, therefor in theory one could do all your RL inside a base model simulating a CUEnv and that would be sufficient, which is obviously nonsense (or maybe not?)
SIMA 2 is our most capable AI agent for virtual 3D worlds. 👾🌐
Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images. Here’s how 🧵