Bringing experts and leaders together to navigate transformative AI

Joined December 2025
1 Photos and videos
Constellation Institute retweeted
May 19
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
31
193
918
348,491
Constellation Institute retweeted
Great to work on this with @OwainEvans_UK @LevMckinney @jan_dubinski_ @a_karvonen and @jameschua_sg. This was done on the Astra Fellowship @ConstellOrg
2
1
7
531
Congrats to Astra fellows @HarryMayne5, @LevMckinney, @jan_dubinski_ on this fascinating new paper, which builds on multiple research strands from Constellation affiliates.
New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook
1
9
676
Constellation Institute retweeted
MASSIVE Congrats to astra fellow @joemkwon for first-authoring this work! Super excited to see more strategy stream work get published, as our first cohort from this year wraps up here at @ConstellOrg
New paper: research agenda for secret loyalties Imagine a frontier model that has been trained to covertly advance a specific actor's interests (a nation-state, a CEO, an adversary). @joemkwon argues this is an urgent, neglected, and addressable problem. 🧵
1
3
21
2,111
Constellation Institute retweeted
Astra has literally changed my whole career trajectory. I can't recommend it enough! If you're still considering applying, you should probably hurry 🏃
❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80% of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬
4
12
1,050
Constellation Institute retweeted
I learned more about AI safety at Constellation through seminars, talks, and conversations with other fellows over lunch and dinner, than I had in years before. Also, the food is so good that alone might be reason enough to apply!
❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80% of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬
2
13
734
Constellation Institute retweeted
❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80% of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬
🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/a…
3
24
116
53,325
Constellation Institute retweeted
Narrow finetuning on bad data can cause broad misalignment. Can inoculation prompting or diluting bad data with good prevent this emergent misalignment? We find such interventions hide misalignment rather than remove it: it reappears when prompts contain cues (sometimes surprising ones) that evoke the bad data. Really enjoyed working on this with @OwainEvans_UK, @BetleyJan, and @anna_sztyber during the Astra Fellowship at @ConstellOrg!
New paper: Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good? Prior work suggests you can. 
We show the misalignment is still present but hiding. It is triggered by adding cues to prompts, evoking the bad data.
1
9
40
6,124
If you're looking for a high-leverage position to advance AI safety and security, @ConstellOrg is hiring for program/research management, operations, talent, and IT roles: constellation.org/careers
In 2017, there were a few dozen people working full time on AI safety. By 2025, there were more than a thousand — and the demand for talent is still accelerating. We badly need fieldbuilders who can find and develop that talent. A thread:
1
3
487
Constellation Institute retweeted
my team at Coefficient Giving are looking for AI governance grantmaking fellows, via @ConstellOrg's Astra fellowship! applications close May 3rd, some more details in this thread constellation.org/programs/a…
2
7
75
4,875
Constellation Institute retweeted
Announcing the Generator Residency: a 3-month residency for AI safety generalists, by @KairosAIS × @ConstellOrg. Fully funded. In-person in Berkeley. Summer 2026. 🗓 Apply by April 27 generatorresidency.org/?utm_…
16
54
437
56,175
Constellation Institute retweeted
If you want to work in AI Safety, several month research programs like Astra, MATS, etc are one of the best ways. Astra's next round just opened, apply now!
🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/a…
7
30
405
53,364
Exciting new research from Astra & Anthropic Fellows working out of Constellation: one of the first independent AI safety audits of a new model. Congrats to @yong_zhengxin, @parvmahajan0, and everyone who contributed!
🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)
1
12
2,266
Constellation Institute retweeted
🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/a…
22
167
1,100
232,882