James Michaelov

James Michaelov

15 Photos and videos

Tweets

James Michaelov @jamichaelov

Jun 11

Seems like a good time to share our new preprint about model openness! (with @linguist_cat @tylerachang @PamRiv1 @SamuelTaylorCS @camrobjones @Sean_Trott @roger_p_levy Ben Bergen @drmaltman):

1,592

more replies

James Michaelov

James Michaelov @jamichaelov

Jun 11

We also discuss other nuances, including factors to consider in safety, socio-technical, and HCI research; approaches to mitigating the problems associated with closed-weight models; and the limits of what open weights alone can provide

150

James Michaelov

James Michaelov @jamichaelov

Jun 11

Here’s a summary of our main conclusions, and a link to the paper: arxiv.org/abs/2603.26539

5,502

Stella Biderman

James Michaelov retweeted

Stella Biderman @BlancheMinerva

Jun 10

In film, "we'll fix it in post" is what you say when something went wrong on set and you don't want to redo it. AI research has made it our entire methodology: train the model, then patch whatever comes out. Our new ICML oral argues this can't be the basis of a science of AI. 🧵

341

43,124

James Michaelov

James Michaelov @jamichaelov

Mar 27

Had a great first day at #HSP2026 yesterday! Looking forward to presenting on the relationship between reading time, n-grams, and language model scaling at the 12.10-2pm poster session today!

223

James Michaelov

James Michaelov @jamichaelov

4 Dec 2025

Presenting this at the poster session this morning (11-2pm) at #5109

James Michaelov @jamichaelov

25 Nov 2025

Excited to announce that I’ll be presenting a paper at #NeurIPS this year! Reach out if you’re interested in chatting about LM training dynamics, architectural differences, shortcuts/heuristics, or anything at the CogSci/NLP/AI interface in general! #Neurips2025

370

James Michaelov

James Michaelov @jamichaelov

1 Dec 2025

Looking forward to #NeurIPS25 this week 🏝️! I'll be presenting at Poster Session 3 (11-2 on Thursday). Feel free to reach out!

James Michaelov @jamichaelov

25 Nov 2025

307

James Michaelov

James Michaelov @jamichaelov

25 Nov 2025

2,740

James Michaelov

James Michaelov @jamichaelov

25 Nov 2025

Preprint link: arxiv.org/abs/2510.24963

133

James Michaelov

James Michaelov @jamichaelov

25 Nov 2025

I'll also be presenting this paper with @linguist_cat at #CogInterp! x.com/linguist_cat/status/19…

Catherine Arnett @linguist_cat

24 Nov 2025

Replying to @linguist_cat

@jamichaelov and I will be presenting our paper at the @CogInterp workshop 13:15 - 14:45 on Dec 7th. We show how disaggregating grammatical benchmarks over the course of training reveals stages of training where models learn heuristics before learning more generalizable patterns.

254

James Michaelov

James Michaelov @jamichaelov

12 Jun 2025

New paper accepted at Findings of ACL! TL;DR: While language models generally predict sentences describing possible events to have a higher probability than impossible (animacy-violating) ones, this is not robust for generally unlikely events is impacted by semantic relatedness

413

James Michaelov

James Michaelov @jamichaelov

12 Jun 2025

In the most extreme case, LMs assign sentences such as ‘the car was given a parking ticket by the explorer’ (unlikely but possible event) a lower probability than ‘the car was given a parking ticket by the brake’ (impossible event, related final word) over half of the time.

104

James Michaelov

James Michaelov @jamichaelov

12 Jun 2025

See the full paper here: arxiv.org/abs/2506.06808

Not quite Sherlock Holmes: Language model predictions do not...

Can language models reliably predict that possible events are more likely than merely improbable ones? By teasing apart possibility, typicality, and contextual relatedness, we show that despite...

arxiv.org

James Michaelov

James Michaelov @jamichaelov

13 Mar 2025

Excited to share the second paper of this research project!

Catherine Arnett @linguist_cat

7 Mar 2025

✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.

935

James Michaelov

James Michaelov @jamichaelov

6 Oct 2024

Also generally interested in chatting about cognitive modeling, scaling, and language comprehension/understanding in humans and machines! @COLM_conf #COLM2024

James Michaelov @jamichaelov

6 Oct 2024

Excited to present this at COLM this week! Reach out if you want to meet/chat!

1,042

James Michaelov

James Michaelov @jamichaelov

6 Oct 2024

Excited to present this at COLM this week! Reach out if you want to meet/chat!

James Michaelov @jamichaelov

1 May 2024

New preprint with @linguist_cat and Ben Bergen! We’ve all heard of the new wave of recurrent language models, but how good are they for modeling human language comprehension? Quite good, it turns out! 🧵 arxiv.org/abs/2404.19178

Abstract: Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.

ALT Abstract: Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.

1,856

James Michaelov

James Michaelov @jamichaelov

27 Aug 2024

This paper is now accepted to be presented at @COLM_conf! Updated version is on arXiv. Feeling excited for the conference, let me know if you want to meet!

James Michaelov @jamichaelov

1 May 2024

2,052

James Michaelov

James Michaelov @jamichaelov

1 May 2024

4,540

more replies

James Michaelov

James Michaelov @jamichaelov

1 May 2024

With reading time, the results are more variable between experiments, and this seems like it might be related to the difference in stimuli (see paper for more details)

274

James Michaelov

James Michaelov @jamichaelov

1 May 2024

And the current wave of recurrent architectures has just started! As we see more and more new architectures and developments, it will be interesting to see how they compare. One thing does seem clear though: recurrent models are back with a vengeance!

175