Thomas Wolf

Thomas Wolf

536 Photos and videos

Tweets

Thomas Wolf

@Thom_Wolf

22h

Open-source models will become a critical component of civilizational resilience in the AGI age. They will ensure that humanity retains access to a meaningful level of intelligence, regardless of the decisions of any individual actor.

180

13,444

clem 🤗

Thomas Wolf retweeted

clem 🤗

@ClementDelangue

Jun 13

Decided to go to DC next week to talk directly with policymakers. Not sure how impactful it will be but with everything happening, feels like a good time to share more about open-source AI, transparency, concentration of power, the real risks vs the real benefits. Who do you think I should meet there (Congress members, WH people, public orgs,...)?

691

41,150

Nathan Lambert

Thomas Wolf retweeted

Nathan Lambert

@natolambert

Jun 11

The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix. The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets. A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it. The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years. It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with. The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith. As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research. If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models. This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.

367

47,419

Jasper

Thomas Wolf retweeted

Jasper

@heyjasperai

Jun 11

.@dh7net, SVP of Image Research, said it best: "The HF infra is a no-brainer." A big unlock for teams working with large datasets for training, especially when they update over time. Read how Jasper used @huggingface as the creation and storage backbone for MONET: huggingface.co/storage/testi…

15,404

Thomas Wolf

Thomas Wolf

@Thom_Wolf

Jun 8

AI is moving beyond text, images, and code. Engineering artifacts are becoming a new class of model outputs and evaluating them requires different tools than we use for text, code, or images. Today we're excited to release CADGenBench, a benchmark for CAD generation and editing. - Given an engineering drawing → generate a valid 3D CAD model - Given a STEP file change request → edit it correctly The benchmark is tool-agnostic: any CAD stack works (Fusion, Onshape, build123d, SolidWorks, etc.). Submissions are simply STEP files. Models are scored on: * geometric accuracy * topology correctness * interface compatibility * CAD validity The benchmark is open, the ground truth is private, and the leaderboard is live. Since CAD evaluation is surprisingly subtle, here's how the metrics work 🧵

Michael Rabinovich

@MikushRab

Jun 8

Introducing CADGenBench: measure how well AI systems produce engineering-grade 3D parts! While current models can generate 3D parts, they are far from precise enough to build functional parts. We built a benchmark to systematically measure their capabilities on two tasks: 1. Generation from an engineering drawing of a part 2. Editing: given an existing STEP file and a requested change The benchmark is tool-agnostic. It makes no assumptions about how you build the model. You can vary the LLM, and you can vary the environment. Use build123d, Onshape, Autodesk, or a model without an LLM entirely. We open sourced the scoring engine and a reference baseline on top of build123d. A collaboration between Hugging Face and @mecadoinc! Submission space: huggingface.co/spaces/Huggin… Code repository: github.com/huggingface/cadge…

0:43

128

31,129

more replies

Thomas Wolf

Thomas Wolf

@Thom_Wolf

Jun 8

4/ Why three metrics? The metrics are designed to capture different classes of errors. Shape similarity measures overall geometry. Interface match measures whether mating features are present in the correct location and size. Topology match measures whether the fundamental structure of the part is correct. None of these metrics can fully replace the others.

1,582

Thomas Wolf

Thomas Wolf

@Thom_Wolf

Jun 8

5/ The big picture Benchmarks for language, code, images, and reasoning are now well established. CAD generation and editing require different evaluation criteria. CADGenBench is an attempt to make those criteria explicit, reproducible, and comparable across systems. Leaderboard: huggingface.co/spaces/Huggin… Code: github.com/huggingface/cadge…

CADGenBench Leaderboard - a Hugging Face Space by HuggingAI4Engineering

Leaderboard for AI-driven CAD generation

huggingface.co

1,681

Thomas Wolf

Thomas Wolf

@Thom_Wolf

Jun 8

"Starting today, OpenEnv will be coordinated by a committee that so far includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face." Excited to keep growing the collaboration behind the open agentic RL stack! Read more at huggingface.co/blog/openenv-…

The Open Source Community is backing OpenEnv for Agentic RL

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Ben Burtenshaw

@ben_burtenshaw

Jun 8

So excited to be opening up OpenEnv to the whole community. It will now be owned by @huggingface , Meta-PyTorch, @reflection_ai , @UnslothAI , @modal, @PrimeIntellect , @NVIDIAAI , @mercor_ai , and @fleet_ai . the reason is: frontier labs train the model and the harness together, so the model is fitted to its harness. that coupling is a chunk of why claude code and codex feel so good. open source can't do that. you bring whatever harness, whatever model, whatever env, whatever trainer. which is the whole point of open source and also the problem for training. openenv is the socket in between all of this. in short: it's a protocol layer, not a reward framework. it does not have opinions about your rewards or your training loop. those live in the libs that are actually good at them. read more in the blog post. it's early, come break it.

0:05

8,243

LeRobot

Thomas Wolf retweeted

LeRobot

@LeRobotHF

Jun 6

VLA-JEPA just dropped in LeRobot 🤖 What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics. During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head. The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark! VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀 @Thom_Wolf @ClementDelangue

0:35

184

1,365

293,056

Flo Crivello

Thomas Wolf retweeted

Flo Crivello

@Altimor

Jun 4

Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models. Saves us millions of $ and we're actually seeing an *increase* in performance on many core use cases. Transformative for the business.

169

162

2,569

929,738

Thomas Wolf

Thomas Wolf

@Thom_Wolf

Jun 3

👀

16,703

Georgia Channing

Thomas Wolf retweeted

Georgia Channing

@cgeorgiaw

Jun 1

huge for getting medical data ACTUALLY used by the machine learning community All publicly funded science should be open source 🔥🔥🔥

Daniel van Strien

@vanstriendaniel

Jun 1

Hugging Face is the home for AI & ML across every domain, including biomedical! The @NIH just added the @huggingface Hub to its official list of Generalist Repositories for data sharing. NIH-funded? You can point to the Hub in your data sharing plan 🤗

18,083

Georgi Gerganov

Thomas Wolf retweeted

Georgi Gerganov

@ggerganov

May 29

llama.cpp now has an official website: llama.app Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line cross-platform installer. The installation provides a single unified `llama` entrypoint which you can use to run/serve models and interface with 3rd-party agentic applications. While oriented towards simplified user experience, the new `llama` application also provides all the advanced functionality of the existing llama.cpp tooling with which experienced users are already familiar. Also note that all GGUF models that you might have already downloaded with llama.cpp in the past will be automatically available to use without downloading again (they are stored in the common HF cache on your machine). We have many improvements in the pipeline both at the UX and at the engine level and we plan to iteratively ship new things over the coming months. One of the main focuses will be seamless integration with local-friendly 3rd-party agents (such as Pi). In the meantime, we’ll continue to listen for feedback from the community and adjust accordingly, so keep letting us know what you think and need.

483

2,979

163,971

Michael Rabinovich

Thomas Wolf retweeted

Michael Rabinovich

@MikushRab

May 29

Opus 4.8 just dropped and I ran it through our CAD tasks. 4.6 → 4.7 → 4.8 side by side. The results are unexpected!

0:30

198

193

3,532

707,651

OpenMed

Thomas Wolf retweeted

OpenMed @OpenMed_AI

May 27

CARBON: 8B open DNA model, 65K-token context, whole human genome on a single GPU in <2 days. Clinical-language counterpart: OpenMed, 1,000 open medical models on @huggingface, eval sets published with the weights. Apache 2.0 on both. 🤗

Lewis Tunstall

@_lewtun

May 26

The Carbon tech report is now on bioRxiv. It provides a detailed recipe for training fully open and efficient DNA models - enjoy!

7,170