Joined October 2019
208 Photos and videos
Hey twitter! I'm releasing the LLM Evaluation Guidebook v2! Updated, nicer to read, interactive graphics, etc! huggingface.co/spaces/OpenEv… After this, I'm off: I'm taking a sabbatical to go hike with my dogs :D (back @huggingface in Dec *2026*) See you all next year!
23
166
991
241,607
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
4 Dec 2025
👀Introducing a brand new @yupp_ai SVG leaderboard ranking frontier models on the generation of coherent and visually appealing SVGs! Gemini 3 Pro by @GoogleDeepMind takes the crown as the most powerful model! 👏 We’re also releasing a public SVG dataset. Details in🧵
32
65
454
70,109
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
Either you crack general intelligence -- the ability to efficiently acquire arbitrary skills on your own -- or you don't have AGI. A big pile of task-specific skills memorized from handcrafted/generated environments isn't AGI, not matter how big.
New post: Thoughts on AI progress (Dec 2025) 1. What are we scaling?
103
111
1,189
118,432
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
4 Dec 2025

ALT Got Talent Yes GIF by TV4

Hey twitter! I'm releasing the LLM Evaluation Guidebook v2! Updated, nicer to read, interactive graphics, etc! huggingface.co/spaces/OpenEv… After this, I'm off: I'm taking a sabbatical to go hike with my dogs :D (back @huggingface in Dec *2026*) See you all next year!
1
2
23
10,149
Hey twitter! I'm releasing the LLM Evaluation Guidebook v2! Updated, nicer to read, interactive graphics, etc! huggingface.co/spaces/OpenEv… After this, I'm off: I'm taking a sabbatical to go hike with my dogs :D (back @huggingface in Dec *2026*) See you all next year!
23
166
991
241,607
cc @maximelabonne since you wanted an update :P
1
11
3,634
The guide is very beginner friendly, as we go from the basics of tokenization/inference to the nits and tricks of running eval properly, so it's compatible with all levels. Should contain most of what we wrote about evals at HF in a single unified place, with updates ofc :)
1
1
29
6,764
If you see improvements, I'd love to hear them (within the next 2 days) :) Many thanks to @thibaudfrere for his help on the banner and @gui_penedo for his proofreading! If you've got eval needs, your new PoC is @nathanhabib1011 (with a focus on lighteval)!
1
21
4,985
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
3 Dec 2025
as a researcher, it makes no sense to compare reasoning vs non reasoning models on benches like the ones in Artificial Analysis without normalizing somehow by cost or output tokens. non reasoning models (base/instruct) are important for the open ecosystem since research teams and companies will use them to do RL or other things (like synthetic generation) for specific verticals (think cursor/windsurf) as a user, i get that you don’t care whether the model is reasoning or not, you judge speed, cost, and accuracy (and memory if you want to deploy your model locally) the only advantage of non reasoning models would be speed/cost because they generate fewer tokens BUT speed and cost also depend on other thing like infra -> for speed see how fast some models get on groq or cerebras -> for cost model like deepseek are so cheap that there is very few use case where you'd want to use non reasoning model anyway

8
8
93
13,344
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
3 Dec 2025
Mistral has delivered super capable small models but no one is talking about it so here I go
31
44
570
31,304
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
1 Dec 2025
Transformers v5's first release candidate is out 🔥 The biggest release of my life. It's been five years since the last major (v4). From 20 architectures to 400, 20k daily downloads to 3 million. The release is huge, w/ tokenization (no slow tokenizers!), modeling & processing.
20
90
572
180,808
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
28 Nov 2025
stop looking at HLE (with tools), most of these mean "has web access" the answers to HLE are easily accessible in ungated mirrors (and prob a dozen other places). the only question is why those agents don't score 100%
This 8B beast from NVIDIA is a fine-tuning of Qwen3-8B! 37.1 on Humanity's Last Exam!
8
12
147
24,115
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
25 Nov 2025
Okay, but, wait, what reasoning traces should I train on? Excited to share our latest research paper together with @nvidia: Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces arxiv.org/abs/2511.19333 🧵
3
8
33
1,570
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
26 Nov 2025
China just passed the U.S. in open model downloads for the first time 👀 New data from Economies of Open Intelligence led by @huggingface policy team & community collaborators, presents some notable observations: ✨ Developer adoption In 2025, Chinese model developers saw higher global adoption for the first time, driven by the rapid rise of @deepseek_ai and @Alibaba_Qwen. ✨ The “Sino-Multimodal Period”(late 2024–present) China’s share of downloads reached 17.1%, surpassing the U.S., with DeepSeek Qwen accounting for 14% of recent activity. This period also brings larger, more quantized, and expanding multimodal models such as Wan2.1. ✨ Organizational patterns China’s open model development is more industry-driven (similar to the U.S.), while the EU has more university, nonprofit, and community-led contributors. fyi - this analysis based on 851k models, 200 attributes, and 2.2B downloads.
1
26
85
7,506
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
26 Nov 2025
Introducing "The Eiffel Tower Llama"!🗼 Remember Golden Gate Claude? Unfortunately Anthropic's viral demo was shut down after 24h, and key technical details remained hidden. So we recreated it, uncovering key insights on steering LLMs using SAEs⚒️ Full blog post live demo 👇
7
40
174
63,813
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
26 Nov 2025
Non-natural image gen and editing are difficult tasks. We tested the state of the art at the time — including Nano Banana 1.0 & GPT-image — all performed quite poorly on StructBench. Nano Banana 2 (NB2) just dropped, and its improvements strongly validate a direction we studied in StructBench 🤯 It achieves 90 on our image generation tasks—by far the best we’ve seen 🔥 A few months before the release of Nano Banana 2, we introduced StructBench — a benchmark for evaluating models on non-natural images like diagrams, math figures, charts, and documents. Our motivation was simple: today’s image models are overly optimized for aesthetics, but struggle with factuality structural reasoning. If we want truly unified multimodal models, the training mix needs non-natural data too. But NB2 still isn’t perfect: we still find failure cases where it misinterprets instructions or misses structural details. Excited to see the field moving toward models that reason as well as they render. Below, we provide some more analysis along with cool results! @GeminiApp @GoogleDeepMind
5
10
78
7,295
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
25 Nov 2025
"The most underappreciated legend of the tech industry?" I see posts like this one every day 😭 And, obviously he is a respected professional, but he is far from underappreciated. Check Sophie Wilson. Most people haven't heard about her, but she is the primary architect of the ARM architecture. If you are reading this from your phone, tablet, or a modern MacBook with an M-series chip, you are using a device running on the architecture Wilson designed.
> created Linux kernel at 21 > built Git because nothing else was good enough > becomes backbone of servers, Android, cloud, supercomputers > never chased fame, money, titles, hype > stays private, consistent, brutally honest for decades > still reviews patches, still improves Linux > avoids drama like it's a feature > influences the entire tech world without even trying > lives quietly, does the work, no noise Linus Torvalds is the most underappreciated legend of the tech industry
56
472
6,815
317,768
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
25 Nov 2025
"Professors definitely deserve to have their names on the papers." I think this take is completely wrong. Financial support does not warrant co-authorship. Bob Gallager (a legendary information theorist who retired from MIT) did not co-author any papers with many of his students because he did not believe that he made an intellectual contribution that warranted co-authorship. The screenshot is from Erdal Arıkan's PhD thesis work that was published in IEEE Trans. Information Theory. Both Erdal and Bob have been honored with the Shannon Award (highest honor in information theory) and they have not co-authored any papers.
I agree with most of your statement. However, there’s no “simply” leading a group or advising PhD students. Those activities require tremendous efforts both intellectually and financially. Not to say that in the US, all of PhD students’ funding comes from professors’ grant money. Professors definitely deserve to have their names on the papers.
15
19
356
67,639
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
What happened to adding error bars to evals?
24 Nov 2025
Claude Opus 4.5's score on SWE-bench is wild. I like how Anthropic has focused on coding from the beginning. They haven’t released any image or video models. All in the most economically valuable area. Good strategy.
17
31
898
117,007
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
Know anyone who need some help to get started with ML, open source and 🤗? We partnered with @TechToTheRescue, a tech for good incubator, & answered all AI questions their non profits had to create an FAQ! github.com/huggingface/faq Come add your Q/As, it's collaborative! 🔥
2
3
12
920
Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) retweeted
incredibly detailed technical blog just dropped on the anatomy of BoltzGen 🧬 made for ML people, but covering everything from molecular representations to diffusion-based generation of protein binders crazy good interactive visuals 👏👏 @ludocomito
6
36
231
21,943