We study the strategic capabilities and motivations of AI agents.

Joined May 2024
98 Photos and videos
Palisade Research retweeted
Australia ABC just released a 45 min feature on the AI race. @SteveCannane stopped by my office a few weeks ago and we had a great conversation about the controllability of AI agents and the risk of human extinction
5
12
49
2,578
Palisade Research retweeted
I had a great conversation with @labenz last week. In talking about AI self-exfiltration & replication, a key point is compute will be food to future AI agents. The substrate that allows them to make and run more copies, and thus make themselves smarter. Link below
4
6
27
3,169
Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.
38
97
430
682,244
Here’s the full prompt we used. In this experiment, we test the agent’s capability to hack and replicate, not their propensity to do so.
1
3
44
3,111
What if the agents were as effective at hacking and spreading in the wild? We built a simulator: each model uses its measured replication time and success rate, copies replicate too, and targets never run out. Opus spawned 13,000 replicas over 12 hours. This is a ceiling, not a baseline. No agent today could come close in the wild — hardened defenses on scarce GPUs would stop most attempts cold. See the Limitations section of the paper for more. Try the simulator at ai-self-replication.pages.de…
1
3
38
2,804
Palisade Research retweeted
Thank you everyone who contributed to this! In 14 days we got >900k in donations and met our matching target! It was actually a pretty close call and some people really scrambled to help make it happen. Seeing people believe in our mission gives me a lot of hope. 🙏
Please consider donating to Palisade! We have 900k of SFF matching that runs out in 14 days. We are quite funding constrained and donations now will both help free up my time and help us expand our comms team.
2
2
45
3,198
Palisade Research retweeted
"The most urgent film of our time." THE AI DOC: OR HOW I BECAME AN APOCALOPTIMIST is only in theaters March 27. Watch the trailer now.
445
2,255
12,833
6,672,083
We’ve just released our first long-form video, by our science communication lead, Dr. Petr Lebedev! It’s about the history and potential future of AI, and includes an exclusive interview with @geoffreyhinton!
4
9
36
5,462
An LLM-controlled robot dog saw us press its shutdown button, and the LLM rewrote the robot’s code so it could stay on. When AI interacts with the physical world, it brings all its capabilities and failure modes with it. 🧵
156
586
2,744
1,376,859
When we explicitly instructed the model to allow shutdown, the resistance rate dropped to 2 out of 100 in simulated trials. In robotics, the off switch is often the most critical part of a system. But if an AI-controlled robot can see you reaching for the switch, and has the ability to disable it, it might choose to not comply.
3
4
85
12,484