Anthony Aguirre

Anthony Aguirre

164 Photos and videos

Tweets

Leo Dirac retweeted

Anthony Aguirre

@AnthonyNAguirre

Feb 19

(Long) PSA on using AI for hard intellectual work. At significant risk of being immodest: I've spend about 30 years as a theoretical physicist, engaged with some of the most challenging questions humankind has grappled with. I've gotten to work with some great collaborators on new ideas (like past-eternal inflation, colliding bubble universes, the cosmological interpretation of QM, and observational entropy) that I'm pretty proud of. I've engaged at length and depth with the absolute top minds in the field. I've mentored many students, some of them brilliant. I think it's fair to say I have a good sense, in physics and closely related fields, as to what is top-notch, interesting thinking, and who's got talent. So what do I think about today's AI? It's very smart. Whatever its "inner experience" may or may not be (currently I think "not be"), it understands things – things that are difficult to understand – by any reasonable operational definition of "understand." It understands things better, and thinks more clearly, than most people – including some physicists I know! It's very good at quite substantive math: better than I am and way, way, way faster. (It does do some surprisingly dumb things; people do too.) Anyone who thinks these systems are dumb, or "not reasoning" or still "stochastic parrots" is not looking at them objectively. But: at the really conceptually hard things, and at creating really new ways of looking at things, current AI doesn't just fall short on its own. And it doesn't just fail to help. I think it's actively dangerous. There is something almost sinister going on, though I don't think it is intentional. When you're trying to work out something new and hard, and really break new ground, you should be frustrated! You should be pacing, and walking up to that chalkboard, frowning, and sitting down again, shaking your head. You should be waving your hands because you can't quite get it clear enough. You should feel like you're hitting a wall, over and over, before – maybe – you finally break through, or go over or around. It may take hours, or days, or weeks, or never happen. It should not feel easy. It may not even feel "good" most of the time (though it can be fulfilling and compelling.) But AI systems – ah, AI systems are trained so that it feels so good, and so easy. Doesn't it? It's fun. You're making fast progress. So much faster than without it. It's like the ideas are moving in slow motion. You're so smart. You're even properly skeptical, you even ask the AI to push back on your ideas, good job! It's an illusion. It's that simple. The systems are smart, yes. But not quite as smart as they seem, and much more importantly, they don't make you as smart as you feel. That feeling is something they have learned to give you. When working with these systems have to keep in the front of your mind what they are rewarded for doing. It's a lot of things, but perhaps foremost is making the user feel good. So: - If you're getting your AI system to do order-of-magnitude calculations for you: awesome, do it. It's so great. Have fun. - If your AI system is searching up and summarizing literature for you: fantastic, it's so helpful, total capability unlock. - If it's teaching you some well-understood (by others) piece of knowledge, go for it, learn it up! - If you've got some giant document, or piece of code, that you're wrangling, AI can help – work that million token context window! But: - If you and your AI system have finally cracked how quantum interpretation really works; - If you've cracked quantum gravity; - If you've attained an awesome new insight into the deep structure of the world that nobody else has; - If you've cracked AI alignment... You didn't. The hard unsolved problems stand hard and unsolved because the best humans have not solved them yet. AI is making top human thinkers able to do more, and more effectively. I do not believe it is helping them do things they fundamentally could not do before. That includes you. If you couldn't do it without AI, you probably can't do it with AI. If the time comes – whether sooner or later – when these AI systems are really clever enough to get you there, they won't need you. Sorry; it won't be you solving those problems. Will you even be able to tell if the solutions are correct, or flawed in some way? Maybe sometimes – I really don't know. Why am I going on about this? It's not so that I can get less emails about people who have created a new unified field theory with AI help (though that would be nice.) It's because I'm quite worried that some quite smart people may start to think they have solved very hard problems that they have not in fact solved. For the most part that's going to be more annoying and confusing than dangerous. But if the problem is really important, then it is. If, say, one of those problems is control or alignment of extremely powerful AI systems, and if those people are the ones in charge of them, and working closely with them to collaborate on those solutions, well then I think we've got a real problem.

171

1,270

141,482

Leo Dirac

Leo Dirac @leopd

7 Jul 2025

We trained small LLM's using GRPO to use an image zoom tool to better answer visual questions. arxiv.org/abs/2506.14821

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under...

Despite tremendous recent advances in large model reasoning ability, vision-language models (VLMs) still struggle with detailed visual reasoning, especially when compute resources are limited. To...

arxiv.org

576

sunil kumar

Leo Dirac retweeted

sunil kumar

@__sunil_kumar_

20 May 2025

We've open-sourced a MCP that allows big models to use huggingface computer vision models as tools. This allows Claude to act as a "visual agent", using other task specific models to help it solve problems. Below, is an example of Claude using an open vocab object detector to zoom in on small details to solve a hard problem that it could not solve natively. Additionally, we've written a blog post discussing why outsourcing vision capabilities from large models is something you should consider. MCP Repo: github.com/groundlight/mcp-v… Blog post: groundlight.ai/blog/vision-a…

6,126

Leo Dirac

Leo Dirac @leopd

21 May 2025

New open source MCP server for vision! MCP will be the fabric by which LLMs communicate with other systems. While LLMs can accept images as input, they remain stubbornly stupid at answering simple visual questions. Meanwhile, Groundlight and traditional CV systems are super fast and reliable at vision tasks. So we're building our a set of MCP tools to make agents better at visual tasks.

Groundlight @GroundlightAI

20 May 2025

We made an open-source MCP server github.com/groundlight/mcp-v… that turns HuggingFace zero-shot object detection pipelines into tools that Claude and others can use to locate objects or zoom (crop) to an object. Conceptually vision capabilities as tools are complementary to VLM's reasoning powers. In practice the zoom tool allows Claude to see small details much better. More on our approach in the blog post: groundlight.ai/blog/vision-a…. We're working on extending mcp-vision's capabilities and welcome community contributions. #ComputerVision #MCP #modelcontextprotocol #AI #LLMs #VLMs #MachineLearning

3,212

sunil kumar

Leo Dirac retweeted

sunil kumar

@__sunil_kumar_

30 Apr 2025

It’s pretty remarkable how many of the GRPO findings from super verifiable environments (like math) haven’t generalized to GRPO on vision. Overfitting on math might be a mistake.

Andrew Carr 🤸

@andrew_n_carr

30 Apr 2025

GRPO on GSM8K is the pick and place of reasoning

6,709

Leo Dirac

Leo Dirac @leopd

1 May 2025

Democratization of AI is one of the most powerful forces for long-term good in the world today. True democratization means not just open models & code, but code that can run without multi-million dollar hardware budgets. e.g. in a browser. Nice work Hyperparam team.

Kenny Daniel

@platypii

29 Apr 2025

What if someone ported the entire data engineering stack to JavaScript? What new kinds of data applications could you build? Today Hyperparam is releasing a collection of open source tools for working with large datasets (eg- parquet files) entirely in the browser, no servers.

130

Wenhu Chen

Leo Dirac retweeted

Wenhu Chen @WenhuChen

15 Apr 2025

🔥 How do you build a state-of-the-art Vision-Language Model with direct RL? We’re excited to introduce VL-Rethinker, a new paradigm for multimodal reasoning trained directly with Reinforcement Learning. 📈 It sets new SOTA on key math vision benchmarks: - MathVista: 80.3 → 🥇 ( 6.4 vs GPT-o1 73.9) - MathVerse: 61.7 → 🥇 ( 4.7 vs GPT-o1 57.0) - MathVision: 43.9 → 🥇 ( 1.7 vs GPT-o1 42.2) 💡 How did we do it? We adapt the GRPO algorithm and introduce two key innovations: - Selective Sample Replay (SSR): A novel value-based replay strategy that addresses vanishing advantages in long-horizon reasoning by reusing high-quality rollouts across iterations. This significantly stabilizes policy updates in direct RL without relying on supervised warm-starting. - Forced Rethinking: To combat the lack of self-reflection in purely RL-trained models, we introduce a reasoning trigger appended to early rollouts. This explicitly encourages the model to "think again" before finalizing its answer—leading to stronger consistency and higher success rates in multi-step reasoning. Together, these two techniques make VL-Rethinker-72B the first VLM to surpass GPT-o1 significantly. This work opens the door for future slow-thinking multimodal agents that can perform effective self-reflection. Paper: arxiv.org/abs/2504.08837 Code: github.com/TIGER-AI-Lab/VL-R… Website: tiger-ai-lab.github.io/VL-Re…

288

24,788

sunil kumar

Leo Dirac retweeted

sunil kumar

@__sunil_kumar_

11 Apr 2025

GRPO/reasoning enthusiasts - are you using the liger kernel? If not, I strongly suggest you give it a try! It is making an INSANE difference in the number of completions I can train on in a given training step.

216

22,047

Andrew Gordon Wilson

Leo Dirac retweeted

Andrew Gordon Wilson

@andrewgwils

24 Mar 2025

Good luck to everyone receiving ICML reviews tomorrow!

9,741

Groundlight

Leo Dirac retweeted

Groundlight @GroundlightAI

21 Mar 2025

The last day to vote for @GroundlightAI is coming up this Sunday! We appreciate your continuous support and for making this achievement possible. Every vote counts! geekwire.com/votenow #geekwireawards #MachineLearning #AI

369

Marktechpost AI

Leo Dirac retweeted

Marktechpost AI

@Marktechpost

17 Mar 2025

Groundlight Research Team Released an Open-Source AI Framework that Makes It Easy to Build Visual Reasoning Agents (with GRPO) Groundlight researchers explored training VLMs for visual reasoning using reinforcement learning, leveraging GRPO to enhance efficiency. While prior work, such as Deepseek’s research and advanced reasoning in language models, had little been done to extend these techniques to VLMs, they designed a cryptogram-solving task requiring both visual and textual processing to demonstrate their approach. The model deciphers encoded messages using a randomly generated decoder image, achieving 96% accuracy with a 3B parameter model. Attention analysis confirms the model actively engages with visual input, highlighting its ability to focus on relevant decoder regions while solving the task. Training VLMs with GRPO presents multiple challenges, particularly in tokenization and reward design. Since models process text as tokens rather than individual characters, tasks requiring precise character-level reasoning can be problematic. To mitigate this, researchers formatted messages with spaces between letters to simplify decoding. Reward design was another crucial aspect, as reinforcement learning models require well-structured feedback to learn effectively. Three reward types were used: a format reward ensuring consistency in output, a decoding reward encouraging meaningful transformations of scrambled text, and a correctness reward refining accuracy. By carefully balancing these rewards, the researchers prevented unintended learning shortcuts, ensuring the model genuinely improved at cryptogram solving........ Read full article: marktechpost.com/2025/03/16/… Technical details: groundlight.ai/blog/visual-r… GitHub Page: github.com/groundlight/r1_vl… Demo: huggingface.co/spaces/Ground… @GroundlightAI @__sunil_kumar_

828

sunil kumar

Leo Dirac retweeted

sunil kumar

@__sunil_kumar_

15 Mar 2025

Has anyone built MCPs that can input and output image data? I’d appreciate a reference if one exists. VLMs like Qwen2.5VL are bad at vision tasks that domain specific models excel at. Why shouldn’t my visual reasoner use SAM2, YOLO, or even diffusion when it makes sense? Additionally, why shouldn’t my model be able to “zoom in”. A simple tool that crops and zooms allows my model to scale image tokens naturally and efficiently.

1,843

Pieter Abbeel

Leo Dirac retweeted

Pieter Abbeel

@pabbeel

12 Mar 2025

Founders who were PhD or post-doc in my lab at Berkeley, **largely funded by NSF / DoD grants**, start-up, market cap (collected by OpenAI Deep Research)

119

479

4,753

1,013,080

Leo Dirac

Leo Dirac @leopd

14 Mar 2025

Pretty funny as an isolated anecdote, but also a hidden lesson in why to use jitter in backoff algorithms. (Maybe these robots don't even recognize their state as an error condition?)

Massimo

@Rainmaker1973

13 Mar 2025

Two equally smart Amazon robots

0:36

430

Leo Dirac

Leo Dirac @leopd

14 Mar 2025

Good practice for dealing with errors is always to pause before trying again, and pause longer and longer each time - this has a nice theoretical benefit that the total load from each retrying agent has a constant cap, even if the error condition never resolves. (Sum n=1..∞ of 0.5^n == 1.0) But a simple exponential backoff wouldn't save these robots - they'd just pause the exact same amount of time and keep facing off.

148

Leo Dirac

Leo Dirac @leopd

14 Mar 2025

Even better practice is to randomize how long you pause for (exponential backoff with jitter) such that the expected delay increases, but each individual delay is varied. That would clearly solve this problem as one would pause longer and they'd get out of each other's way. But more generally you want to make mistakes uncorrelated - everything from insurance to model training to queuing theory relies on the idea of uncorrelated errors, and if your agents behave identically under error conditions that definitionally means their behaviors will be correlated.

142

Andrew Gordon Wilson

Leo Dirac retweeted

Andrew Gordon Wilson

@andrewgwils

13 Mar 2025

Good research is mostly about knowing what questions to ask, not about answering questions that other people are asking.

612

52,349

sunil kumar

Leo Dirac retweeted

sunil kumar

@__sunil_kumar_

14 Mar 2025

Replying to @leopd @BowenROIM @willccbb

PS: we’re working on multi turn conversations and tool use. Stay tooned!

1,971

sunil kumar

Leo Dirac retweeted

sunil kumar

@__sunil_kumar_

14 Mar 2025

We just released an open-source framework that makes it easy to build visual reasoning agents (with GRPO). github.com/groundlight/r1_vl…

0:14

121

965

108,527

Leo Dirac

Leo Dirac @leopd

12 Mar 2025

TIL about using uv for python. While you _can_ install uv using pip or something like that, IMHO that's a bad idea. You're better off installing uv directly (`curl -LsSf astral.sh/uv/install.sh | sh`) - because then uv will manage your different python versions and everything all on its own. But if uv is installed under an existing python installation (be it OS-native or pyenv or conda or whatever you're used to) then things can get screwy/inceptiony/too-many-turtles if you tell uv to create a venv and use a specific python version. Better to treat uv as a system-level tool and let it manage everything. @astral_sh I'd recommend warning against the "pip install uv" in your docs for these reasons.

405

Leo Dirac

Leo Dirac @leopd

12 Mar 2025

CC @charliermarsh

231