Joined December 2010
232 Photos and videos
OpenAI @OpenAICodex desktop in-app @ browser doesn't seem to be working for me on Windows10. Even a reinstall doesn't help. Any suggestions to give Codex Desktop access to the embedded browser?
32
Oona - demonstrating elegance in all ways. Post pictures of your cats :)
27
Appropriate for tonight: "Oh my god, I'm back. I'm home. All the time. You finally really did it. You maniacs. You blew it up. Ah damn you. God damn you all to hell."
1
1
34
Eric Winnington retweeted
Vanilla Devin solved the first level of ARC-AGI-3!
3
3
24
2,474
How many phishing attacks do you get per week on GitHub?
29
The current US-Israel vs Iran war and subsequent downstream effects on world supply bring to mind the Bronze Age collapse. When societies optimised for Trade outgrew local systems, the systemic collapse was the outcome. Now it is Oil, LNG & Helium thus Fertiliser and chips.
4
68
This is the largest energy supply disruption in history. The 1973 crisis is the closest analogue in political terms (Middle East producer retaliation against Western policy), but the scale of physical disruption, the number of countries involved, the breadth of commodities affected (crude products LNG LPG petrochemicals fertilizer feedstock), and the infrastructure destruction are all without precedent. After tonight — with both sides now deliberately targeting each other’s upstream energy production — I don’t think the 1970s comparison is adequate anymore. A more honest framing: this is a new category of event. The world built a global energy system predicated on the assumption that the Persian Gulf’s production infrastructure would never be a battlefield. That assumption just died.
3
32
Claude cli /remote-control tailscale is such a power combo. You talk to your mobile, tell it to do a new task and host it on the tailscale ip, and immediately you have your new feature reachable from your phone. Feels like fiction!
41
This is probably the one of the most important pieces of writing of the year - a milestone.
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanochat… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
42
Do you remember when you had actual days to write a position paper, your colleagues took time to read it, debate it, improve it - finally it was approved, published and used as strategy for the next year?
23
Downloading my x.ai account data is giving me HTTP 503 right now...
46
Cyprus is not NATO, but is in the EU defence pact… gotta be careful with collateral on that island. UK is in NATO, does an attack on its airbase count for Art 5?
1
322
Oh they don’t know what’s going to happen to them: Amazon delivers - from orbit - tungsten rods - with express delivery.
there's an AWS outage in me-central-1 because it got bombed
251
That moment when you’re talking to Anthopic’s Claude Sonnet and it suddenly brings up something really insightful and completely out of distribution. Insight about me I’ve never heard from another human. 4.6 is that step on the staircase. Ouch. Ai 1 - human inf Still that 0->1
37
Even if models stopped improving now, harnesses and hardware are locked in to change the world. Intelligence in every device, robot, drone at super human reasoning speed. But models are not yet at the top of their S curve. We are not ready for what’s coming.
16
Eric Winnington retweeted
UniFi Travel router Giveaway!!! To enter like and RT this post, winner will be announced February 27th
54
441
562
34,695
Anthropic CoWork: if you haven’t tried, pay the money and download it. I had a very old asp website to refactor, and I only had the wayback machine copy of it! CoWork stripped it clean, made it modern and ready for redeployment in ~30 minutes using Sonnet 4.6. Worth every cent.
61
I like this download speed. Thanks Ollama!
21