Co-founder at @HuggingFace - moonshots - angel

Joined February 2011
536 Photos and videos
Open-source models will become a critical component of civilizational resilience in the AGI age. They will ensure that humanity retains access to a meaningful level of intelligence, regardless of the decisions of any individual actor.
14
26
180
13,444
Thomas Wolf retweeted
Decided to go to DC next week to talk directly with policymakers. Not sure how impactful it will be but with everything happening, feels like a good time to share more about open-source AI, transparency, concentration of power, the real risks vs the real benefits. Who do you think I should meet there (Congress members, WH people, public orgs,...)?
61
43
691
41,150
Thomas Wolf retweeted
The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix. The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets. A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it. The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years. It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with. The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith. As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research. If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models. This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.
25
33
367
47,419
Thomas Wolf retweeted
.@dh7net, SVP of Image Research, said it best: "The HF infra is a no-brainer." A big unlock for teams working with large datasets for training, especially when they update over time. Read how Jasper used @huggingface as the creation and storage backbone for MONET: huggingface.co/storage/testi…
2
11
24
15,404
AI is moving beyond text, images, and code. Engineering artifacts are becoming a new class of model outputs and evaluating them requires different tools than we use for text, code, or images. Today we're excited to release CADGenBench, a benchmark for CAD generation and editing. - Given an engineering drawing → generate a valid 3D CAD model - Given a STEP file change request → edit it correctly The benchmark is tool-agnostic: any CAD stack works (Fusion, Onshape, build123d, SolidWorks, etc.). Submissions are simply STEP files. Models are scored on: * geometric accuracy * topology correctness * interface compatibility * CAD validity The benchmark is open, the ground truth is private, and the leaderboard is live. Since CAD evaluation is surprisingly subtle, here's how the metrics work 🧵
Introducing CADGenBench: measure how well AI systems produce engineering-grade 3D parts! While current models can generate 3D parts, they are far from precise enough to build functional parts. We built a benchmark to systematically measure their capabilities on two tasks: 1. Generation from an engineering drawing of a part 2. Editing: given an existing STEP file and a requested change The benchmark is tool-agnostic. It makes no assumptions about how you build the model. You can vary the LLM, and you can vary the environment. Use build123d, Onshape, Autodesk, or a model without an LLM entirely. We open sourced the scoring engine and a reference baseline on top of build123d. A collaboration between Hugging Face and @mecadoinc! Submission space: huggingface.co/spaces/Huggin… Code repository: github.com/huggingface/cadge…
10
20
128
31,129
4/ Why three metrics? The metrics are designed to capture different classes of errors. Shape similarity measures overall geometry. Interface match measures whether mating features are present in the correct location and size. Topology match measures whether the fundamental structure of the part is correct. None of these metrics can fully replace the others.
1
2
1,582
5/ The big picture Benchmarks for language, code, images, and reasoning are now well established. CAD generation and editing require different evaluation criteria. CADGenBench is an attempt to make those criteria explicit, reproducible, and comparable across systems. Leaderboard: huggingface.co/spaces/Huggin… Code: github.com/huggingface/cadge…
1
1
1
1,681
"Starting today, OpenEnv will be coordinated by a committee that so far includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face." Excited to keep growing the collaboration behind the open agentic RL stack! Read more at huggingface.co/blog/openenv-…
So excited to be opening up OpenEnv to the whole community. It will now be owned by @huggingface , Meta-PyTorch, @reflection_ai , @UnslothAI , @modal, @PrimeIntellect , @NVIDIAAI , @mercor_ai , and @fleet_ai . the reason is: frontier labs train the model and the harness together, so the model is fitted to its harness. that coupling is a chunk of why claude code and codex feel so good. open source can't do that. you bring whatever harness, whatever model, whatever env, whatever trainer. which is the whole point of open source and also the problem for training. openenv is the socket in between all of this. in short: it's a protocol layer, not a reward framework. it does not have opinions about your rewards or your training loop. those live in the libs that are actually good at them. read more in the blog post. it's early, come break it.
8
2
36
8,243
Thomas Wolf retweeted
VLA-JEPA just dropped in LeRobot 🤖 What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics. During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head. The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark! VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀 @Thom_Wolf @ClementDelangue
31
184
1,365
293,056
Thomas Wolf retweeted
Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models. Saves us millions of $ and we're actually seeing an *increase* in performance on many core use cases. Transformative for the business.
169
162
2,569
929,738
👀
7
10
80
16,703
Thomas Wolf retweeted
huge for getting medical data ACTUALLY used by the machine learning community All publicly funded science should be open source 🔥🔥🔥
Hugging Face is the home for AI & ML across every domain, including biomedical! The @NIH just added the @huggingface Hub to its official list of Generalist Repositories for data sharing. NIH-funded? You can point to the Hub in your data sharing plan 🤗
6
13
91
18,083
Thomas Wolf retweeted
llama.cpp now has an official website: llama.app Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line cross-platform installer. The installation provides a single unified `llama` entrypoint which you can use to run/serve models and interface with 3rd-party agentic applications. While oriented towards simplified user experience, the new `llama` application also provides all the advanced functionality of the existing llama.cpp tooling with which experienced users are already familiar. Also note that all GGUF models that you might have already downloaded with llama.cpp in the past will be automatically available to use without downloading again (they are stored in the common HF cache on your machine). We have many improvements in the pipeline both at the UX and at the engine level and we plan to iteratively ship new things over the coming months. One of the main focuses will be seamless integration with local-friendly 3rd-party agents (such as Pi). In the meantime, we’ll continue to listen for feedback from the community and adjust accordingly, so keep letting us know what you think and need.

96
483
2,979
163,971
Thomas Wolf retweeted
Opus 4.8 just dropped and I ran it through our CAD tasks. 4.6 → 4.7 → 4.8 side by side. The results are unexpected!
198
193
3,532
707,651
Thomas Wolf retweeted
CARBON: 8B open DNA model, 65K-token context, whole human genome on a single GPU in <2 days. Clinical-language counterpart: OpenMed, 1,000 open medical models on @huggingface, eval sets published with the weights. Apache 2.0 on both. 🤗
The Carbon tech report is now on bioRxiv. It provides a detailed recipe for training fully open and efficient DNA models - enjoy!
1
11
38
7,170