Leandro von Werra

Leandro von Werra

36 Photos and videos

Tweets

Kashif Rasul retweeted

Leandro von Werra

@lvwerra

Jun 9

Deep dive into FNS: building a tokenizer that chunks text efficiently but has character level resolution! FNS augments the loss with character level signal at training time while at inference time you can decode single characters. Deep dive here: huggingface.co/spaces/Huggin…

4,240

Sergio Paniego

Kashif Rasul retweeted

Sergio Paniego

@SergioPaniego

May 7

OpenEnv already ships 🚢 with a ready-to-deploy RLM environment on free HF Spaces Drop "Attention Is All You Need", write code that spawns parallel LLM calls → ✅ answer in 4.2s Run GRPO (TRL) → model learns to write that search strategy itself 👀@lateinteraction @a1zhang

4,430

Sergio Paniego

Kashif Rasul retweeted

Sergio Paniego

@SergioPaniego

Apr 15

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy and… it's already supported in TRL, built by @krasul. you can really feel the pace of development in the team 🐎 paper by @onloglogn, @richard_baihe, @UnderGroundJeg, Navdeep Jaitly, @trebolloc, @YizheZhangNLP at Apple 🍎 how it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed you can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder): github.com/huggingface/trl/b… or benchmark a checkpoint with the eval script: github.com/huggingface/trl/b… one neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help want to dig deeper? paper: huggingface.co/papers/2604.0… trainer docs: huggingface.co/docs/trl/main…

227

37,732

Benjamin Bossan

Kashif Rasul retweeted

Benjamin Bossan @BenjaminBossan

Apr 14

Today, we released PEFT v0.19.0 and it's a big one. Not only did we add 9 new PEFT methods, the release also contains a bunch of improvements to make PEFT more useful. Check the thread for details:

551

Quentin Gallouédec

Kashif Rasul retweeted

Quentin Gallouédec @QGallouedec

Mar 30

France is about to pass a law punishing support for the genocide in Palestine! 🇫🇷🇵🇸 just kidding. it’s actually a proposal to restrict criticism of Israel, in the so-called country of human rights and free speech. @SandrineRunel, je vous appelle à voter contre la loi Yadan.

573

Sergio Paniego

Kashif Rasul retweeted

Sergio Paniego

@SergioPaniego

Mar 16

check out this new notebook by @krasul on TimesFM 2.5, Google's time series foundation model which is now supported in transformers zero-shot forecasting, quantile predictions, LoRA fine-tuning, and forecasting with exogenous covariates colab.research.google.com/gi…

1,030

Stas Bekman

Kashif Rasul retweeted

Stas Bekman

@StasBekman

Mar 9

Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into @huggingface Trainer, Accelerate and TRL For extensive details please see this writeup: huggingface.co/blog/ulysses-… Thanks a lot to @krasul for helping make it happen. Also the others in the HF team who helped with integration.

116

17,818

Rémi Ouazan

Kashif Rasul retweeted

Rémi Ouazan

@remi_or_

12 Dec 2025

Just opened a PR to make continuous batching in transformers go EVEN faster🚆 With simple optimizations like no torch sync and more GPU-sided operations, we gained 10-14.5% throughput across 500 requests🥳 Soon, there will be native fast RL training in transformers. Keep up 😉

7,736

Ferdinand Mom

Kashif Rasul retweeted

Ferdinand Mom

@FerdinandMom

9 Dec 2025

In collaboration with @PyTorch team, we added transformers modeling backend to torchtitan library ! This means training any Dense model (MoE support coming soon) with torch.compile FSDPP/TP/PP/CP out of the box with no performance drop !

2,237

Stas Bekman

Kashif Rasul retweeted

Stas Bekman

@StasBekman

21 Nov 2025

Ulysses Sequence Parallelism integration from Arctic Long Sequence Training has been merged into @huggingface HF Trainer. github.com/huggingface/trans… Thanks to @krasul and @_marcsun for help with integration and Weijie Zhang for being the first early adopter! There is also work being done on integration into HF trl.

HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate by stas00 · Pull...

Integrates HF Accelerate's support for ALST/Ulysses sequence parallelism huggingface/accelerate#3817 into HF Trainer TODO: docs - no idea where? the FSDP/CP is not documented, or any para...

github.com

1,662

Benny (Yufei) Chen

Kashif Rasul retweeted

Benny (Yufei) Chen

@the_bunny_chen

20 Nov 2025

Reinforcement Learning for agents has been held back by a lack of standard infrastructure. Production agents don't live in clean "gyms"—they live in messy, async environments. Today we’re open-sourcing Eval Protocol: a framework to run RL directly on your production agents. Day 0 support for trainers and environments like TRL (@huggingface), rLLM (@Agentica_), OpenEnv (@PyTorch), as well as support for proprietary trainers like @OpenAI RFT and Tinker from @thinkymachines . 🧵

126,240

Carlos Miguel Patiño

Kashif Rasul retweeted

Carlos Miguel Patiño

@cmpatino_

29 Oct 2025

On-policy distillation is a promising way to train small models, but it’s usually limited to teacher–student pairs sharing the same tokenizer. With our GOLD method, you can now distill across different model families and even outperform GRPO! huggingface.co/spaces/Huggin…

170

42,036

Sergio Paniego

Kashif Rasul retweeted

Sergio Paniego

@SergioPaniego

15 Oct 2025

Qwen released their new small and dense VLMs (Qwen3-VL). They're incredibly capable and one of my all-time favourite VLMs. 🤗 We’ve prepared some resources to help you get started. sharing in the next one

1,057

Clémentine Fourrier 🍊 is off till Dec 2026 (🪂)

Kashif Rasul retweeted

Clémentine Fourrier 🍊 is off till Dec 2026 (🪂)@clefourrier

17 Sep 2025

Updated the evaluation guidebook with a new deep dive! 2025 panorama of all the important and next level evaluations that you need to know to build *actually impactful and useful* models! (Assistant tasks, games, forecasting, and more) Tell me wyt! :) github.com/huggingface/evalu…

evaluation-guidebook/yearly_dives/2025-evaluations-for-useful-models.md at main · huggingface/eva...

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval! - huggingface/evaluation-guidebook

github.com

166

18,542

Sergio Paniego

Kashif Rasul retweeted

Sergio Paniego

@SergioPaniego

16 Sep 2025

Training long-context LLMs is getting easier! TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly 💆 Combine TRL and accelerate to run it effortlessly!

ALT context parallelism new guide in trl slide

151

11,778

Quentin Gallouédec

Kashif Rasul retweeted

Quentin Gallouédec @QGallouedec

10 Sep 2025

🚀 Just shipped TRL v0.23 - train with *any* context length This release brings Context Parallelism which allow to train with arbitrary context length along with major improvements for post-training Here’s what’s new 🧵👇

115

9,809

elie

Kashif Rasul retweeted

elie

@eliebakouch

2 Sep 2025

Super excited to announce that our research team at @huggingface will be doing an AMA on r/LocalLLaMA. Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe there’ll be a shiny new release to talk about? Thursday 4th September, 8AM-11AM PST 🤗

158

37,733

sabman

Kashif Rasul retweeted

sabman

@sabman

22 Aug 2025

Announcing geoai.js - GeoAI for the JavaScript community 🌍 Run AI models in the browser Node.js, powered by 🤗 transformers.js by @huggingface @geobaseapp Live demos → docs.geobase.app/geoai-live/… #gischat #javascript #geoai #transformersjs

21,035

Brendan Hogan

Kashif Rasul retweeted

Brendan Hogan

@brendanh0gan

13 Aug 2025

introducing qqWen: our fully open-sourced project (code weights data detailed technical report) for full-stack finetuning (pretrain SFT RL) a series of models (1.5b, 3b, 7b, 14b & 32b) for a niche financial programming language called Q All details below!

738

133,427

Sergio Paniego

Kashif Rasul retweeted

Sergio Paniego

@SergioPaniego

18 Jul 2025

🧑‍🍳 New Multimodal Fine-Tuning Recipe 🧑‍🍳 ⚡️ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.

ALT Fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding using TRL

186

9,925