METR

METR

1 Photos and videos

Tweets

Constellation Institute retweeted

METR

@METR_Evals

May 19

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

193

918

348,491

Harry Mayne

Constellation Institute retweeted

Harry Mayne

@HarryMayne5

May 15

Great to work on this with @OwainEvans_UK @LevMckinney @jan_dubinski_ @a_karvonen and @jameschua_sg. This was done on the Astra Fellowship @ConstellOrg

531

Constellation Institute

Constellation Institute

@ConstellOrg

May 15

Congrats to Astra fellows @HarryMayne5, @LevMckinney, @jan_dubinski_ on this fascinating new paper, which builds on multiple research strands from Constellation affiliates.

Owain Evans

@OwainEvans_UK

May 15

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

676

🚀Henry is leading AI Safety Research Programs

Constellation Institute retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

May 13

MASSIVE Congrats to astra fellow @joemkwon for first-authoring this work! Super excited to see more strategy stream work get published, as our first cohort from this year wraps up here at @ConstellOrg

Tom Davidson

@TomDavidsonX

May 13

New paper: research agenda for secret loyalties Imagine a frontier model that has been trained to covertly advance a specific actor's interests (a nation-state, a CEO, an adversary). @joemkwon argues this is an urgent, neglected, and addressable problem. 🧵

2,111

Weronika Żurek🔸

Constellation Institute retweeted

Weronika Żurek🔸@WeronikaMZurek

May 1

Astra has literally changed my whole career trajectory. I can't recommend it enough! If you're still considering applying, you should probably hurry 🏃

🚀Henry is leading AI Safety Research Programs

@sleight_henry

May 1

❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80% of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬

1,050

Yernat Yestekov

Constellation Institute retweeted

Yernat Yestekov @double_why

May 2

I learned more about AI safety at Constellation through seminars, talks, and conversations with other fellows over lunch and dinner, than I had in years before. Also, the food is so good that alone might be reason enough to apply!

🚀Henry is leading AI Safety Research Programs

@sleight_henry

May 1

734

🚀Henry is leading AI Safety Research Programs

Constellation Institute retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

May 1

🚀Henry is leading AI Safety Research Programs

@sleight_henry

Apr 6

🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/a…

116

53,325

Jan Dubiński

Constellation Institute retweeted

Jan Dubiński

@jan_dubinski_

Apr 29

Narrow finetuning on bad data can cause broad misalignment. Can inoculation prompting or diluting bad data with good prevent this emergent misalignment? We find such interventions hide misalignment rather than remove it: it reappears when prompts contain cues (sometimes surprising ones) that evoke the bad data. Really enjoyed working on this with @OwainEvans_UK, @BetleyJan, and @anna_sztyber during the Astra Fellowship at @ConstellOrg!

Owain Evans

@OwainEvans_UK

Apr 29

New paper: Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good? Prior work suggests you can.  We show the misalignment is still present but hiding. It is triggered by adding cues to prompts, evoking the bad data.

6,124

Constellation Institute

Constellation Institute

@ConstellOrg

Apr 20

If you're looking for a high-leverage position to advance AI safety and security, @ConstellOrg is hiring for program/research management, operations, talent, and IT roles: constellation.org/careers

Careers | Constellation Institute

We have the rare privilege to directly impact the future of AI safety – and to work with many of the world’s leading experts in the field.

constellation.org

80,000 Hours

@80000Hours

Apr 20

In 2017, there were a few dozen people working full time on AI safety. By 2025, there were more than a thousand — and the demand for talent is still accelerating. We badly need fieldbuilders who can find and develop that talent. A thread:

487

Constellation Institute

Constellation Institute

@ConstellOrg

Apr 20

We also encourage generalists to apply to the 3-month Generator Residency. Applications are due by April 27 for the summer 2026 cohort. generatorresidency.org/

Generator Residency | Constellation & Kairos

A 3-month summer residency placing generalists into core AI safety organizations. 15–30 residents, June 15 – August 28, 2026.

generatorresidency.org

245

catherine ʕ•ᴥ•ʔ-☆

Constellation Institute retweeted

catherine ʕ•ᴥ•ʔ-☆@wilhelmscreamin

Apr 9

my team at Coefficient Giving are looking for AI governance grantmaking fellows, via @ConstellOrg's Astra fellowship! applications close May 3rd, some more details in this thread constellation.org/programs/a…

Programs | Astra Fellowship

Astra is Constellation’s flagship fellowship. It brings exceptional people into the field, connecting them with leading mentors and research opportunities.

constellation.org

4,875

Agus 🔸

Constellation Institute retweeted

Agus 🔸

@austinc3301

Apr 9

Announcing the Generator Residency: a 3-month residency for AI safety generalists, by @KairosAIS × @ConstellOrg. Fully funded. In-person in Berkeley. Summer 2026. 🗓 Apply by April 27 generatorresidency.org/?utm_…

Generator Residency | Constellation & Kairos

A 3-month summer residency placing generalists into core AI safety organizations. 15–30 residents, June 15 – August 28, 2026.

generatorresidency.org

437

56,175

Neel Nanda

Constellation Institute retweeted

Neel Nanda

@NeelNanda5

Apr 6

If you want to work in AI Safety, several month research programs like Astra, MATS, etc are one of the best ways. Astra's next round just opened, apply now!

🚀Henry is leading AI Safety Research Programs

@sleight_henry

Apr 6

405

53,364

Constellation Institute

Constellation Institute

@ConstellOrg

Apr 6

Exciting new research from Astra & Anthropic Fellows working out of Constellation: one of the first independent AI safety audits of a new model. Congrats to @yong_zhengxin, @parvmahajan0, and everyone who contributed!

Yong Zheng-Xin

@yong_zhengxin

Apr 6

🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)

2,266

🚀Henry is leading AI Safety Research Programs

Constellation Institute retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

Apr 6

167

1,100

232,882