Center for Human-Compatible AI

Center for Human-Compatible AI

6 Photos and videos

Tweets

Pinned Tweet

Center for Human-Compatible AI @CHAI_Berkeley

Mar 6

📣 Open Call for Posters! Submit your work to the poster session at the CHAI 2026 Workshop. Link below! ⏱️ Deadline: March 26, 2026 at 11:59p.m. PST. 🗓 June 4–7, 2026 at the Asilomar Conference Grounds in Pacific Grove, CA.

3,036

Rachel Freedman (will be @ICML2026)

Center for Human-Compatible AI retweeted

Rachel Freedman (will be @ICML2026)@FreedmanRach

May 13

Active Teacher Selection for Reward Learning: now published in TMLR! Most RLHF systems assume feedback comes from one canonical teacher — but annotators can disagree over 30% of the time. So who should the agent ask for feedback? Paper: arxiv.org/abs/2310.15288v3

6,807

Cam Allen

Center for Human-Compatible AI retweeted

Cam Allen @camall3n

May 11

How do knowledge and meaning change in the age of AI, and what can we learn from silence and art? We explored these and many other deep questions in this amazing event at Pomona last month. youtube.com/watch?v=dG9JuK3S…

424

Sarah Liaw

Center for Human-Compatible AI retweeted

Sarah Liaw @liaw_sarah

Apr 16

My internship work at @CHAI_Berkeley (@UCBerkeley) was accepted to @aistats_conf! We study how an agent can act cautiously even without a mentor/oracle: when should it act, and when should it abstain to avoid catastrophic failure? 📄Paper: arxiv.org/abs/2510.14884 🧵

5,444

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

Mar 6

3,036

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

Mar 6

We're interested in both emerging questions and in less recent research, if relevant.

695

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

Mar 6

Read our call for posters here! workshop.humancompatible.ai/

562

Tianyi Alex Qiu

Center for Human-Compatible AI retweeted

Tianyi Alex Qiu @Tianyi_Alex_Qiu

Jan 29

How to elicit truth from models that may be mistaken❌ or deceptive😈? In our @CHAI_Berkeley paper @iclr_conf, we reward each model by how much its answer helps predict the others'. With weak supervision from a 0.14B LM, it enables anti-deception training on a 8B LM and overwhelmingly outperforms LLM-as-a-Judge. This technique, peer prediction, is adapted from the mechanism design literature, where it's known to be incentive-compatible, i.e., incentivizes honesty. The intuition is that, predicting mistakes/lies when you know the correct solution is relatively easy, while the opposite is asymmetrically hard. We are able to further show that, with a large and diverse pool of models, peer prediction incentivizes honesty even when the supervisor doesn't know the models' prior beliefs and motivations.

916

Alex Serrano

Center for Human-Compatible AI retweeted

Alex Serrano @sertealex

17 Dec 2025

What if an AI could learn to hide its thoughts? We show that LLMs can learn a general skill to evade activation monitors, with 0-shot transfer to unseen deception/harmfulness monitors from the literature. We call these "Neural Chameleons". A thread on our new paper. 🦎🧵

0:19

233

45,095

Niklas Lauffer

Center for Human-Compatible AI retweeted

Niklas Lauffer @NiklasLauffer

13 Nov 2025

Our NeurIPS 2025 paper extends adversarial learning (adversarial examples, self-play, etc.) beyond zero-sum games by solving "self-sabotage". 🧵👇

1:34

115

20,330

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

23 Sep 2025

Today, @securite_ia, CHAI, and @thefuturesoc are joined by 70 leading orgs & 200 signatories in a global call for AI Red Lines. Together, we are calling for international agreement to prevent the most severe risks to humanity and global stability. #AIRedLines Learn more:

600

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

23 Sep 2025

Learn about the Global Call: red-lines.ai

300 prominent figures endorse Global Call for AI Red Lines

Global Call for AI Red Lines — urging binding international agreements to prevent unacceptable AI risks by 2026. Signed by 300 prominent figures, Nobel laureates, and 90 organizations.

red-lines.ai

279

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

23 Sep 2025

And, watch the AI Red Lines #CallToAction: youtu.be/_SZe4ci9Ycs

The Global Call for AI Red Lines

The Global Call for AI Red Lines urges governments to reach interna...

youtube.com

384

The Future Society

Center for Human-Compatible AI retweeted

The Future Society @thefuturesoc

22 Sep 2025

The Global Call for AI Red Lines is live!! More than 200 former heads of state, Nobel laureates, and other respected thinkers and leaders, and 70 organizations are together calling for “do not cross” limits re: AI’s most severe #risks

2,023

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

9 Sep 2025

We’re hiring a research assistant for the book that @Michael05156007 is writing on extinction risk from AI! Please apply by September 19, 2025. Link in the next tweet:

1,959

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

9 Sep 2025

aprecruit.berkeley.edu/JPF05… Learn more and apply here 🔗

297

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

9 Sep 2025

Our 2026 internship applications are now open! Learn more about the internship and apply: humancompatible.ai/jobs#chai… Deadline: October 5, 2025, at 11:59 p.m. PST

3,452

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

9 Sep 2025

Our mentors work on a broad range of topics. Check them out here: humancompatible.ai/chai-inte…

353

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

9 Sep 2025

Who should apply? Current undergrads, Master’s, and PhD students and researchers, researchers in CS or adjacent fields, professional software or ML engineers.... The list goes on! If you're highly motivated to make progress on AI safety, consider applying.

320

Center for Human-Compatible AI

Center for Human-Compatible AI @CHAI_Berkeley

9 Sep 2025

Our interns: • Contribute to research with the potential for paper authorship • Build a pathway into AI safety work • Work alongside curious and ethically minded researchers

289

Karim Abdel Sadek

Center for Human-Compatible AI retweeted

Karim Abdel Sadek

@Karim_abdelll

8 Jul 2025

*New AI Alignment Paper* 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal. 😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!

146

19,662