The AI community building the future. hf.co/careers

Joined September 2016
471 Photos and videos
Hugging Face retweeted
Released last week, and already more than 4M downloads on HuggingFace alone 😊 This makes Gemma 4 12B the most popular encoderfree VLM by a large margin. In addition to being the first-ever general purpose LLM with encoderfree audio input!
Our new Gemma 4 12B model hits a sweet spot between size performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can’t wait to see what the community does with this one!
9
23
118
29,970
Hugging Face retweeted
I'm seeing a lot of angry people lately... remember, you can always run your coding agent locally ;) llama.cpp OpenCode = fast, reliable and private inference. This is @UnslothAI North-Mini-Code-1.0-GGUF running at ~50 tokens/s on my Macbook
13
6
103
20,346
Alibaba Qwen3.7 slowly fading into irrelevance at the frontier due to proprietary stance. In it's place we have Minimax M3 and... *checks notes* Rio 3.5 397b, made by the municipal IT company of Rio de Janeiro's city government. huggingface.co/prefeitura-ri…
121
320
3,079
1,657,022
Hugging Face retweeted
Jun 12
Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on @AMD. 🧵
21
101
659
314,767
Hugging Face retweeted
SITUATION DETECTED: The city of Rio de Janerio has post-trained a model. Based on Qwen 7/2, Rio 3.5 Open 397B adds SwiReasoning on top of the base Qwen model — a framework that dynamically switches between standard chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals, so the model only "thinks out loud" when it needs to and otherwise reasons silently in hidden space for better token efficiency.
79
230
2,960
429,670
Hugging Face retweeted
There is no inevitability in AI. We all have agency in what comes next: Path 1: closed-source APIs, concentration of power, and a future decided by a handful of people in Silicon Valley and DC Path 2: open-source AI, where everyone gets to participate, own, and build together, including orgs like the city of Rio. Pick your path anon!
SITUATION DETECTED: The city of Rio de Janerio has post-trained a model. Based on Qwen 7/2, Rio 3.5 Open 397B adds SwiReasoning on top of the base Qwen model — a framework that dynamically switches between standard chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals, so the model only "thinks out loud" when it needs to and otherwise reasons silently in hidden space for better token efficiency.
53
87
789
63,825
Hugging Face retweeted
MiniMax v3 is out and it ships with an MSA kernel that gets crazy speedups the longer your sequence length is. It's shipped on the Kernel Hub as well and is integrated in transformers: huggingface.co/kernels/MiniM…
5
10
43
10,732
Hugging Face retweeted
interesting MiniMax shipped a kernel on Hugging Face 👀 huggingface.co/kernels/MiniM…
1
6
57
17,218
Hugging Face retweeted
🤗 MiniMax M3 from @MiniMax_AI is now live on @huggingface — supported by Novita. Open weights. ~428B total parameters. ~23B activated parameters. Built for the Agent Era.
3
12
61
16,046
Hugging Face retweeted
Have a great week end (don't forget to touch grass 🍃)
1
3
47
11,792
Hugging Face retweeted
new transformers tutorials just dropped for vision 🔥 🛰️ segmentation on satellite imagery: fine-tune RF-DETR-Seg segment buildings 📱 object detection on mobile UI: fine-tune RF-DETR on screenshots runs on toaster, converges fast, give to your agent for your use cases🫡
8
33
180
15,599
Hugging Face retweeted
MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Min… MiniMax Sparse Attention: huggingface.co/papers/2606.1…
Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscrib… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days
114
328
2,769
637,355
Hugging Face retweeted
14
17
246
19,442
Hugging Face retweeted
Published my first kernel to go the last mile to optimize LTX-2.3 from @Lightricks! torch.compile cuDNN attn already gave a 1.42x boost. W/ the custom kernel added, I got 1.52x on a GB10 🔥 This was my systematic exploration of a simple agentic kernel dev workflow. More 👇
8
6
59
9,666
Hugging Face retweeted
Real-time social robotics, from the cloud to your local device. Watch Ian from our DevX team use Gemini Live for a seamless voice chat with Reachy Mini. Then, stick around until the end to see the robot running locally on Gemma 4!
41
147
1,289
102,833
Hugging Face retweeted
Agents are only as good as the environments behind them. At Mercor, we've built deep expertise in the realistic, economically-grounded environments that help agents bridge the gap from the lab to real-world usefulness. We want to put that expertise to work for the broader ecosystem—so we're glad to be joining the OpenEnv committee, alongside Meta @PyTorch, @nvidia, @PrimeIntellect, @huggingface, and others, to help guide the open foundation for agentic environments.
6
20
78
15,495
Hugging Face retweeted
HF has become the best storage platform for PRIVATE and PUBLIC models and datasets, both intermediary and final ones! Great example from @heyjasperai who used HF buckets to store their Monet dataset and train models directly on it! More details: huggingface.co/storage/testi…
8
17
105
18,145
Hugging Face retweeted
Explore your @huggingface repos in a whole new way 🔥 Visualize storage, discover outliers, and navigate your repos directly from the terminal. `hf repos ls --explore`
8
18
110
27,239
Hugging Face retweeted
🔓 And the best part — we're open-sourcing it. 1,000 tps on a 1T model wasn't a single breakthrough — it's deep model × system co-design between the MiMo and TileRT teams, all on general-purpose GPUs (no Cerebras-style wafer-scale, no Groq-style SRAM ASICs). On the model side: FP4 quantization (smaller footprint, less memory traffic) DFlash, our block-masked parallel speculative decoding that accepts far more tokens per verification. On the system side, TileRT tailors its compiler & kernels to exactly these techniques. The result: a 1T model breaking 1,000 tps on a single, standard 8-GPU node. 🤗 Open weights (FP4 DFlash checkpoint): huggingface.co/XiaomiMiMo/Mi…
11
49
578
40,188