Jonathan Stray

Jonathan Stray

1,330 Photos and videos

Tweets

Pinned Tweet

Jonathan Stray

@jonathanstray

Jun 6

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper dataset. We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced. 1/🧵

22,183

Cas (Stephen Casper)

Jonathan Stray retweeted

Cas (Stephen Casper)

@StephenLCasper

Jun 9

There are really interesting academic questions emerging around AI and epistemic risks. I only fear that, by the time we reach consensus, we will be too dumb to understand it.

Kellin Pelrine @KellinPelrine

Jun 9

Humanity's ability to know, reason, judge, and act well is the foundation of science, democracy, crisis response, & management of AI itself. AI poses serious risks to that foundation. New paper on epistemic risks by 30 experts calls for attention to this. Link in thread.

4,055

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 10

Zero-calorie sweeteners are just reward hacking, amirite?

159

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 10

The skills which are hardest to measure will become the most valuable. This is an inversion of the way things are now — where ROI is king — but anything effectively scorable is also going to be effectively trainable.

185

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 9

Feedback loops in human-AI and AI-AI systems pose challenges to our ability to know and reason. We cover the state of the art in grounding AI in truth and trust. Glad to be a part of this landmark paper on "epistemic risks" with a cast of top-tier authors led by Mick Yang.

Kellin Pelrine @KellinPelrine

Jun 9

Replying to @KellinPelrine

Feedback Loops: Human-AI and AI-AI feedback loops are narrowing the epistemic space from which humans and AI draw. This already drives homogenization, and may lead to fragmentation and more self-referential information environments.

231

Kellin Pelrine

Jonathan Stray retweeted

Kellin Pelrine @KellinPelrine

Jun 9

177

22,380

Jay Van Bavel, PhD

Jonathan Stray retweeted

Jay Van Bavel, PhD

@jayvanbavel

Jun 7

There is now a solid body of evidence showing that internet availability is causing a variety of outcomes that adversely affect democracy The answer may have something to do with platform algorithms, such as curated newsfeeds (e.g., on Facebook) or ranking of posts (e.g., the “for you” feed on X). Algorithms have long been in the sights of researchers and regulators as potential culprits of polarization because of their opacity and their known focus on maximizing user engagement and platform dwell time with little regard for the quality of curated content. science.org/doi/10.1126/scie…

189

29,166

Serina Chang

Jonathan Stray retweeted

Serina Chang @serinachang5

Jun 8

When people strongly disagree on an issue, can they agree on what makes a good AI response? We find: yes, more than you might expect! We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵

9,284

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 8

This is exactly the problem I’m trying to solve by defining and building “politically neutral” AI, see previous thread.

Jay Van Bavel, PhD

@jayvanbavel

Jun 6

Political identity drives choice of large language models—even when accuracy is incentivized. Participants (N=1,884) quickly developed preferences for AI systems that aligned with their political identities, and these preferences were stronger when models carried recognizable brand names. In the second stage, 71% persisted with their previously preferred model despite incentives for correctness. This reveals that users do not treat AI systems as neutral tools. Instead, they select between them in ways that reflect political identity. osf.io/preprints/socarxiv/z5… This is consistent with the identity-based model of belief: People select information sources and allocate attention toward in-group sources. You then need strong incentives to override their partisan bias: sciencedirect.com/science/ar…

307

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 6

22,183

more replies

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 6

Why do we ask if people “approve” of the answer? Should we ask if it’s fair, or informative, or trustworthy instead? It turns out all of these correlate — you get the same answer. This is consistent with previous work on perceptions of news credibility. 7/

370

Jonathan Stray

Jonathan Stray

@jonathanstray

Jun 6

Our definition turns “neutral” into something empirically testable, generalizes to any conflict, and is grounded in political theory. And it really does find better answers that everyone can agree on. Paper arxiv.org/abs/2605.28911 Dataset github.com/HumanCompatibleAI… /FIN

Political Neutrality as Balanced Approval: A Large-Scale Human...

As AI systems increasingly shape political views, defining and evaluating AI political neutrality is an urgent problem. Here, we propose a new definition of AI political neutrality and design a...

arxiv.org

343

Jonathan Stray

Jonathan Stray

@jonathanstray

May 26

Seeing a flurry of evals and startups promising to test the mental health effects of AI. Literally all of them test what the model says in various conditions... none of them measure actual outcomes on actual people. A big gap, fixable with privacy-preserving experiments.

365

Jonathan Stray

Jonathan Stray

@jonathanstray

May 26

Thank you for the explanation of what you were thinking. But I'm afraid I can only assign you a grade for the paper you wrote, not the paper you could have written.

272

Jay Van Bavel, PhD

Jonathan Stray retweeted

Jay Van Bavel, PhD

@jayvanbavel

May 23

A new experiment involving 1,500 participants in 30 decision environments finds that AI advice depolarizes choices ~on average~, moving participants away from their initial leanings. However, sycophantic AI increases polarization (p < .001). This poses a potential societal problem given that AI becomes increasingly sycophantic the more people engage with it and it customizes answers to match user preferences. This suggests that the design features of AI are going to be critical to the impact is has on individuals, groups, and society. The technology can amplify or mitigate intergroup conflict, depending on how it's designed. papers.ssrn.com/sol3/papers.…

9,944