MCP & AI stuff @AnthropicAI, Research Fellow @Stanford, and 🇦🇺🏃🕺, prev: PhD @MIT

Joined June 2014
171 Photos and videos
Pinned Tweet
22 Jul 2025
This was a genuinely fantastic (and hilarious) panel at ICML. Great takes on AI, security, liability, standards, and governance.
👥👥 TAIG panel discussion! Many thanks to our wonderful panelists @MartaZiosi, @niloofar_mire, Jat Singh, ad @TobinSouth — and to @StephenLCasper for moderating. #ICML2025
4
6
41
37,568
Tobin South retweeted
Fully agree! By default, a similar disempowerment story will play out across all middle powers, but each has sources of leverage that they should expand and that can become much stronger when bundled. 🇦🇺Australia 2031: critical minerals, energy (and thus a big datacenter buildout). 🇨🇦Canada 2031: AI talent, energy ( datacenter buildout). 🇰🇷Korea 2031: HBM. 🇯🇵Japan 2031: robotics and semicon materials. 🇪🇺Europe 2031: lithography, robotics and market power.
@bakkermichiel was on my committee for my PhD. He is one of the most thoughtful people about the future of AI & Europe. Everyone from a middle power 🇦🇺🇰🇷🇪🇺🇯🇵🇮🇳👀 should read/listen to this (it’s goooood)
1
2
33
3,873
Tobin South retweeted
Europe has a lot to lose in the current AI race, and it's worth examining how threats to middle-power sovereignty can result in unsafe outcomes. Such scenarios help illustrate why Europe must invest in AI initiatives that can either leapfrog the current frontier or offer critical components like safety and reliability.
I'm deeply concerned about Europe's future on AI. One of my biggest worries is our erosion of agency, our ability to stay relevant and fight for our values in a future where AI becomes a civilisationally important technology. Myself, @DadaJudith , @bakkermichiel and others have written a scenario to outline a potential future we worry we are on track towards. europe2031.ai/ Every optimistic and realistic path I can see for Europe runs through a central node - one where Europe has more leverage, more importance and more say. One where Europe grows more, builds more where it matters, and takes ownership over its resilience. Europe 2031 is a five-year scenario of the continent's slide into irrelevance: how AI is driving it, and what can still be done. The co-authors are researchers, scientists and investors who have advised European leaders, co-authored national AI strategies, built and funded these systems from the inside. We have no interest in hype and we deeply care about this continent. Europe 2031 ends with five concrete recommendations: - drastically more compute on European soil - an AI middle-power coalition - labour-market reforms - a bold position in robotics and industrial AI - and a positive vision of what AI can do for society. Europe can still change course if it finds the political will and the courage to engage in the most ambitious political and economic agenda the continent has undertaken in peacetime. I encourage you to read it if you have the time:
12
28
178
25,193
Listen to this as a podcast, it’s extremely compelling
Most of Europe has not yet absorbed what AI is about to do to us. The few who have are not saying it loudly enough. We wrote Europe 2031: a five-year scenario of the continent's slide into irrelevance, how AI is driving it, and what can still be done to change course.
5
528
Tobin South retweeted
Most of Europe has not yet absorbed what AI is about to do to us. The few who have are not saying it loudly enough. We wrote Europe 2031: a five-year scenario of the continent's slide into irrelevance, how AI is driving it, and what can still be done to change course.
84
160
875
334,985
‘yeah but I have access to you’ 🥺
26
16
1,783
65,235
it’s time to AGI pill our poli sci friends
Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: darioamodei.com/post/policy-…
1
24
3,355
Tobin South retweeted
Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: darioamodei.com/post/policy-…
1,301
2,402
13,389
6,344,025
this is what an average day at work with fable feels like
this is my personal singularity moment this post may sound like a paid ad. I only wish. I'm concerned, more so than happy. the world is changing, and, among the scenarios where AI goes terribly wrong, inequality is the most realistic, yet, the one Anthropic seems to be the least concerned about. I'm glad OpenAI is taking the opposite stance: *personal AGI for everyone*. I think this is a commendable position in the times we live. but who am I in the queue of the bread? anyway, Fable is here, so I'll just report my first-hour experience first of all, all my pet prompts are solved. → λ-calculus puzzles → bug questions → one-shot apps all are trivial to it. I don't have anything harder other than my ongoing work so, in the last several days, I've been toying with HVM5, a new interaction net evaluator with a faster loop. after writing the first version, I left 32 GPT-5 agents working for ~20 hours each. this resulted in up to 2x speedups, but the file size increased by 2-fold and quality decreased significantly. I then simplified the whole thing into an even simpler core, and left Opus 4.8 and GPT 5.5 optimizing it for 8 hours. Opus got a legit 6% - 34% speedup in most benches. GPT got better results, but, sadly, an unusable file. I then asked Fable to optimize it. 2 hours later, it landed a 1770% speedup in one case, 100% in other 4, and 22% in average. yes, in 2 hours it outperformed me, opus 4.8 and a swarm of gpt 5.5 agents, by one order of magnitude. that could not possibly be legit. "it must be hardcoding the benchmarks" (GPT trauma). so I read its explanation and what it did was, indeed, the most high impact optimization one could try first. seems like HVM5 was wasting a lot of time garbage-collecting unused branches of pattern-match nodes. I had optimized that for static mats, but not for dynamic mats. skill issue. Fable figured how to do it for these, resulting in a massive speedup in some benches but wait, is that *correct*? I'm not sure yet, it is credible, but this is the kind of thing that is very easy to get wrong on interaction nets. the problem is, when I was ready to start auditing Fable's solution so I could tell whether it was buggy or legit, it interrupted me to tell me it had found a massive bug on the code *I* had written. ... wait, what? so... for garbage collection purposes, I stored a bit on lambda term pointers that meant "the variable bound by this lambda has been freed, so, its lambda must free whatever argument it is applied to". that's fine. yet, on duplicator nodes, I also used the same bit to mean "one of the duplicated variables was freed, so, treat this dup as a passthrough no-op". so, if a lambda entered a duplicator, it would mistake the lambda's collection bit for its own, resulting in corrupted interaction! that's a mouthful, why I'm writing this? just so you can appreciate the sheer absurdity of what just happened. I didn't ask it to find bugs. I asked it for an optimization. and even if I did ask it to find bugs, this bug is so astonishingly subtle and specific, identifying it takes mastering the domain to an extent that it beyond even me. I'd easily need hours or days to fix it, *if* I ever came across it. chances are it would just go unnoticed. and Fable found it and fixed it like it was nothing, while it was busy adding a 17x speedup to a file that neither I, nor Opus 4.8, nor a fleet of GPT 5.5 managed to barely make 2x faster. oh and there is also another tab where it is also ripping through Bend's codebase and finishing everything I had to do I don't know what to say anymore this isn't about Anthropic or OpenAI, this is about our collective future as a species. the world is changing, and we need to be aware of it, and discuss how to handle this change. receipt below . . .
3
2
147
33,439
Tobin South retweeted
21
24
601
18,682
this is actually my favorite way to show mythos off. he's a smart one! (and not that expensive!)
test-time compute scaling should be quite relevant these days
12
2,304
Fable 5 is great — it can chain multi-step reasoning while searching through code & slack, orchestrating MCP calls, and cross-referencing facts to make sure it gets the right job done. It's the closest you can feel to a drop-in knowledge worker right now.
Replying to @claudeai
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.
2
13
1,141
if anyone has feedback, let me know and I'll ship improvements!
We've added an observability dashboard for developers of connectors. Connectors let third-party developers bring their tools and data to Claude via MCP.
1
11
2,018
Tobin South retweeted
Personal update: I’ve decided to leave OpenAI. I’m proud to have been part of the custom chip program and grateful to everyone I got to build with and learn from along the way. The density of hardware talent on that team is extraordinary, and I don't think there's a better chip design team anywhere. It's been a wild journey from second hardware hire, 2.4 years ago, to now, and I'm excited to watch these chips become one of the most important engines of AGI. At the same time, I haven’t been able to shake the pull to climb a new mountain from the bottom again! I joined @AnthropicAI this week because I was deeply impressed with the team’s talent, values, and ambition, and I'm already energized by the pace and intensity of the past few days. It’s time to build.
391
322
7,424
2,788,984
youtube shorts optimizing your feed based on your heart rate...
Introducing a research system that enables passive heart rate monitoring (PHRM) during everyday smartphone use. Using the front-facing camera, it achieves industry accuracy standards for heart rate across all skin tones. Check out the blog to learn more: goo.gle/4dQTc2B
1
11
1,919
I think when people talk about model personality, it goes much deeper than the surface level. It's about having a collaborator that you understand and trust. Avoiding slop is about working with the model and supporting it's weak points.
Replying to @thetreygoff
I don’t have the words for it, exactly. Claude has a mind the shape of which I can very clearly interpret, I know where the contours and blind spots are, I know what it will enjoy and therefore do better at and what it considers a chore and performs worse at
1
9
701
Tobin South retweeted
I don’t know how to put into words why Claude Opus is so much better than GPT So I’ll try to explain with a bunch of examples instead:
92
21
780
393,002
I was having trouble logging in and checked the workos status page. good stuff @grinich — it's a lot better than our whole foods three pepper blend page
1
1
9
779
It is hard to sufficiently articulate externally the unbelievable contribution that Claude is able to make to the whole lifecycle of technical discussions, design of solutions, writing code, and debugging incidents. It truly is a new way of working.
Replying to @AnthropicAI
Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025.
1
21
1,926
claude code is great but have you tried painting the golden gate bridge while watching the sunset?
22
4
169
8,131
I wanna see what the model attention heads were going through seeing this
Self-driving cars are fun because you never see competing SaaS products having a literal standoff in the street
2
16
2,432