Delighted to announce the next step in my career!
After my postdoc, I will begin a joint appointment at TU Wien and the Complexity Science Hub Vienna as an Assistant Professor in NLP! My deepest thanks to all who helped me along the way
I'm hiring—details on PhD positions below
Why join? Vienna has a vibrant, rapidly-growing AI scene and a terrific quality of life; TU Wien has an excellent CS department; the Complexity Science Hub is a home to ambitious interdisciplinary work.
If you are curious, opinionated, driven: come!
If you were considering schools in the US but are having visa issues/concerns about relocating there, please consider applying—of course Americans looking for outside opportunities are also welcome!
New preprint out!
Recent work tasks LLM agents with re-running existing social science replication code. Given current capabilities, that should be table stakes
Here, we move up a level of abstraction, and ask models to reproduce results from a paper’s descriptions alone
Can AI agents read a social science paper and write the code from scratch to reproduce its results?
No access to original code. Just text data.
New paper with Ben Kohler, @david_rzs, @__jae_1, and @miserlis_
👇
We view this effort as a first rung on a reproducibility ladder, eventually ending with wholesale replication from only a research question
Ben, and the rest of the team, did terrific work on this and I’m really excited for what’s next!
Now on arXiv: arxiv.org/abs/2604.21965
We were thrilled to host @miserlis_ at our lab!
His insightful talk on topic modeling sparked great discussions and fresh perspectives across the team. Thanks for the visit, we hope to welcome you back soon!
#NLProc
ALT Meme gently critiquing the practice of silicon sampling.
STOP SILICON SAMPLING
LLMS ARE NOT SIMULATIONS OF PEOPLE
DOZENS OF PAPERS yet NO REAL-WORLD USE FOUND for "polling" LLMs
Wanted to prompt an LLM anyway for a laugh? We had a tool for that: It was called "GUESSING"
"Yes, please tell me your presidential preference on a 1-100 scale." "Mark, this is your digital twin" – Statements dreamed up by the utterly Deranged
LOOK at what Prompt Engineers have been demanding your Respect for all this time, with all the layers and attention heads we built for them:
(This are REAL figures, using FAKE people).
[figures from some papers, content not important]
"Hello, pretend you are a Korean-American living in New Jersey, what is your favorite gum brand?"
They have played us for absolute fools
Out of the whole space of bad LLM applications, there is something about this specifically that upsets me on a different level, because it so fundamentally misunderstands the thing it is trying to replace that I fail to understand how the idea ever arose in the first place.
I wrote a blog post on my experience using AI for slide generation Basic idea: write your lecture notes first, then prompt the LLM to produce corresponding slides in reveal.js (h/t @ChenhaoTan). I'm picky about my slides but was happy with the results!
(link in thread below)
ALT A slide showing that the posterior is proportional to the likelihood times the prior
alexanderhoyle.com/posts/ai-…
I included some comparisons to other approaches, including Copilot, which helped me understand why so many people hate Copilot
(The screenshot is not cut off. It just looks like that)
ALT Copilot-generated powerpoint slides for bayes' rule: very bad
If you, as a scientist, cannot be bothered to engage in the intellectual work of science, please quit your job and leave it to someone with skill and integrity.
A new EACL paper! There's been a lot of interest in LLMs for annotation recently, and they tend to treat humans as a ground truth. But we know that's a simplification---humans disagree all the time. Here, we investigate whether we can model that disagreement with LLMs
🚨 Using reasoning LLMs as annotators? You might be erasing critical human disagreements.
Our EACL'25 paper shows RLVR-style reasoning actually HARMS disagreement modeling—even when carefully prompted to consider it! 🙀
📄 Paper: arxiv.org/abs/2506.19467🧵👇
New paper:
We are often told that reasoning tokens aren't faithful explanations. But to have a useful metaphor for their operation we need a characterization of what they are, not what they are not.
To that end, we suggest "State over Tokens" (SoT) 👇🧵
LLMs are increasingly used for text annotation, esp. in the social sciences. Often, this involves placing text items on a scale: eg, 1 for liberal and 9 for conservative
There are a few ways to accomplish this task. Which work best? Our new #EMNLP2025 paper has some answers 🧵
hm in retrospect I'm realizing this example should be more like a 6, based on the things I've seen about Mamdani on this site
anyway, new yorkers: polls are open, go vote
We cover many more models in the paper and have more insights and analysis there! This paper was really a team effort over a long period, and I think it is dense with interesting results