human data @ OAI; views my own

Joined May 2026
1 Photos and videos
Wyatt retweeted
It is actually quite hard to make good evals… More than half the benchmarks you see online are pure slop.
6
1
50
3,889
Hi, I'm Wyatt Thompson, a member of the Human Data team at OpenAI. We recently had the chance to chat privately with Tyler Cowen and Alex Tabarrok about the impact of AI on the future of labor. One theme stood out: economic growth matters. It drives opportunity, innovation, and human flourishing. Here's the conversation:
25
67
592
234,893
May 27
cdn.openai.com/papers/22265b… I often think about a surprise takeaway of PaperBench being that many academic papers are not reproducible even by humans (which makes them bad evals)—the data quality required for AI research at times surpasses academic peer review!

1
4
862