Knowing things is a solved problem. Getting along is not. Working on AI, media, and inter-group conflict @CHAI_Berkeley. Got here from computational journalism.

Joined May 2008
1,330 Photos and videos
Pinned Tweet
What could it mean for an AI to be "politically neutral”? And can we measure it? New paper dataset. We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced. 1/🧵
6
15
53
22,183
Jonathan Stray retweeted
There are really interesting academic questions emerging around AI and epistemic risks. I only fear that, by the time we reach consensus, we will be too dumb to understand it.
Humanity's ability to know, reason, judge, and act well is the foundation of science, democracy, crisis response, & management of AI itself. AI poses serious risks to that foundation. New paper on epistemic risks by 30 experts calls for attention to this. Link in thread.
1
5
34
4,055
Zero-calorie sweeteners are just reward hacking, amirite?
1
159
The skills which are hardest to measure will become the most valuable. This is an inversion of the way things are now — where ROI is king — but anything effectively scorable is also going to be effectively trainable.
5
185
Feedback loops in human-AI and AI-AI systems pose challenges to our ability to know and reason. We cover the state of the art in grounding AI in truth and trust. Glad to be a part of this landmark paper on "epistemic risks" with a cast of top-tier authors led by Mick Yang.
Replying to @KellinPelrine
Feedback Loops: Human-AI and AI-AI feedback loops are narrowing the epistemic space from which humans and AI draw. This already drives homogenization, and may lead to fragmentation and more self-referential information environments.
3
231
Jonathan Stray retweeted
Humanity's ability to know, reason, judge, and act well is the foundation of science, democracy, crisis response, & management of AI itself. AI poses serious risks to that foundation. New paper on epistemic risks by 30 experts calls for attention to this. Link in thread.
7
51
177
22,380
Jonathan Stray retweeted
There is now a solid body of evidence showing that internet availability is causing a variety of outcomes that adversely affect democracy The answer may have something to do with platform algorithms, such as curated newsfeeds (e.g., on Facebook) or ranking of posts (e.g., the “for you” feed on X). Algorithms have long been in the sights of researchers and regulators as potential culprits of polarization because of their opacity and their known focus on maximizing user engagement and platform dwell time with little regard for the quality of curated content. science.org/doi/10.1126/scie…
5
60
189
29,166
Jonathan Stray retweeted
When people strongly disagree on an issue, can they agree on what makes a good AI response? We find: yes, more than you might expect! We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues đź§µ
4
17
95
9,284
This is exactly the problem I’m trying to solve by defining and building “politically neutral” AI, see previous thread.
Political identity drives choice of large language models—even when accuracy is incentivized. Participants (N=1,884) quickly developed preferences for AI systems that aligned with their political identities, and these preferences were stronger when models carried recognizable brand names. In the second stage, 71% persisted with their previously preferred model despite incentives for correctness. This reveals that users do not treat AI systems as neutral tools. Instead, they select between them in ways that reflect political identity. osf.io/preprints/socarxiv/z5… This is consistent with the identity-based model of belief: People select information sources and allocate attention toward in-group sources. You then need strong incentives to override their partisan bias: sciencedirect.com/science/ar…
4
307
What could it mean for an AI to be "politically neutral”? And can we measure it? New paper dataset. We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced. 1/🧵
6
15
53
22,183
Why do we ask if people “approve” of the answer? Should we ask if it’s fair, or informative, or trustworthy instead? It turns out all of these correlate — you get the same answer. This is consistent with previous work on perceptions of news credibility. 7/
1
4
370
Our definition turns “neutral” into something empirically testable, generalizes to any conflict, and is grounded in political theory. And it really does find better answers that everyone can agree on. Paper arxiv.org/abs/2605.28911 Dataset github.com/HumanCompatibleAI… /FIN
3
343
Seeing a flurry of evals and startups promising to test the mental health effects of AI. Literally all of them test what the model says in various conditions... none of them measure actual outcomes on actual people. A big gap, fixable with privacy-preserving experiments.
1
1
7
365
Thank you for the explanation of what you were thinking. But I'm afraid I can only assign you a grade for the paper you wrote, not the paper you could have written.
1
2
272
Jonathan Stray retweeted
A new experiment involving 1,500 participants in 30 decision environments finds that AI advice depolarizes choices ~on average~, moving participants away from their initial leanings. However, sycophantic AI increases polarization (p < .001). This poses a potential societal problem given that AI becomes increasingly sycophantic the more people engage with it and it customizes answers to match user preferences. This suggests that the design features of AI are going to be critical to the impact is has on individuals, groups, and society. The technology can amplify or mitigate intergroup conflict, depending on how it's designed. papers.ssrn.com/sol3/papers.…
5
28
88
9,944