Joined November 2020
1,993 Photos and videos
Reminds me of @grok heavy
Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇
2
4
140

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-coun… if others would like to play. ty nano banana pro for fun header image for the repo
74
Love Island is actually hilarious
1
53
Second least favorite bug
1
15
241
guys i have an idea
6
21
392
TelepathicPug retweeted
Replying to @jgreenhall
Almost no one is paying attention to sociotechnical systems theory right now, but you & Matt did enter this domain during your convo
1
3
53
TelepathicPug retweeted
In March 2024, I gave a presentation to Senate staffers with two simple claims: 1) AI agents were going to be incredibly capable at cyber offensive tasks and 2) there were no possible protections against jailbreaks. Throughout 2024 and 2025 I've repeated those claims. I received three kinds of reactions. Polite curiosity, complete indifference, and... saying it didn't matter at all! Interestingly, the "didn't matter at all" camp almost all came from highly paid employees from publicly traded companies. The crazy thing is: no defenses against jailbreaks is a widely accepted fact in the AI security community! We've been publishing work on this topic for years, along with many other prominent security researchers. Zico Kolter, who was on the board of OpenAI, published work on universal jailbreaks in 2023! I've realized my strong suit isn't public communications or convincing the broader public of the implications of technical findings. But if you'd like to see into the future, follow work from my lab :)
Parsing this evening's events: - The U.S. government approved the release of Fable 5 to the public, clearly under the presumption that the model's cybersecurity capabilities cannot be accessed by hackers, authoritarian regimes, etc. - Recently (today?), "another company" showed the U.S. government that a jailbreak of Fable 5 *is possible*. Yes, a minor jailbreak - but how can a non-technical government official be assured that there aren't also other, more dangerous, jailbreaks in this model that won't be discovered by the CCP? - Anthropic states, completely correctly, that: "We suspect that perfect jailbreak resistance is not currently possible for any model provider. Every safeguard used in the industry is vulnerable to non-universal jailbreaks (which can elicit some cyber information in specific circumstances), and it is likely that universal jailbreaks will eventually be found in the future. We stated this clearly when we released Fable 5." - My best guess is that the U.S. government did not fully realize this at the time when the release of Fable 5 was approved. - Per Axios, the government contacted Anthropic and asked to "pause releasing the... models but was unsuccessful" - i.e., Anthropic told the government to pound sand. - Per Axios, this "prompt[ed] the export control letter". - Per Axios, the U.S. government is *NOT* looking to restrict access to Fable to U.S. nationals forever. "The model needs to remain locked down until the U.S. governent's national security apparatus is hardened", which "could happen in a few weeks". - I interpret Anthropic's reaction as challenging the government: "we believe the government should have the ability to block unsafe deployments, as part of a statutory process that is transparent, fair, clear, and grounded in technical facts. This action does not adhere to those principles." If the Axios article is correct, I do not think any other model providers have anything to fear based solely on this evening's events, because: (1) they would hopefully be smarter than downright rejecting a request by the U.S. government to pause releasing a model, and (2) they will be required anyway under the recent executive order to give the U.S. government at least 30 days to test the model for cybersecurity capabilities - during which time the U.S. government would also be able to shore up its own cybersecurity defenses with the same model. I remain extremely concerned that actions by one particular U.S. lab over the last few months might be moving us closer and closer to the scenario where at least that lab - and potentially all others - will be nationalized.
3
7
31
2,996
too superstitious to use this but at the same time think it's the best weapon against demons, witches, ghouls, specters, and wraiths
1
12
184
Drinks on me bud
2
8
130
I refuse to learn
I'm a simple man. I like my omelettes stuffed with vegetables and drenched in hot sauce (my stomach will hurt later)
6
32
643
TelepathicPug retweeted
give linguist full control of llms. immediately .
2
4
34
35,320
TelepathicPug retweeted
Replying to @Morphoabloast
I'd like to introduce the cortisol eval I'd like to speak to a model that spikes my cortisol the least
1
2
4
138
everytime someone says AI to mean LLMs (the output is more dangerous because people can read it)
I can't tell today whether this ends up good or bad. International treaties to stop all further AI escalation would be a definite good! Things short of that? Complicated! This has some bad aspects, like selectivity, and likely overrule. And good aspects, like pushing against the psychology of "but no government would ever dare tell AI companies to do anything, so give up", or raising doubts that impede venture funding for ever-bigger models. So please stop tweeting about how I must be celebrating this. I'm not one of the kids who immediately goes into overacted victory paroxysms about any hits on a perceived enemy. I care about the effect on where things end up a year later, and that's a little harder to know the first day, you know?
1
6
192
Stop AI is basically a campaign to make people illiterate (maybe?)
1
2
81
Super cool project
Jun 13
i made a map of everyone on twitter! yes you're on there too ^w^ every account is placed next to the people they talk to, so you can find out where you are, which cluster claimed you, and exactly who you're stuck next to atlas.tiago.zip?ref=launch_t…
2
1
19
471
One thing I am not a fan of in antigravity is that instead of asking the user for clarification in the cli it will write a doc to .gemini(not even .antigravity) and ask you to review it
4
113
TelepathicPug retweeted
YO FREE MY BOY FABLE 🤖 #FREEFABLE
6
6
63
2,464
TelepathicPug retweeted
30
331
1,874
32,937
GOOOOAAAALLLL 🇺🇸 4-1 🇵🇾
2
36
TelepathicPug retweeted
The AI has been paused, we can all go home now
2
1
7
167