Research Scientist working on Deep Learning, Time Series Forecasting, Reinforcement Learning and HPC.

Joined August 2007
36 Photos and videos
Kashif Rasul retweeted
Deep dive into FNS: building a tokenizer that chunks text efficiently but has character level resolution! FNS augments the loss with character level signal at training time while at inference time you can decode single characters. Deep dive here: huggingface.co/spaces/Huggin…
6
11
46
4,240
Kashif Rasul retweeted
OpenEnv already ships 🚢 with a ready-to-deploy RLM environment on free HF Spaces Drop "Attention Is All You Need", write code that spawns parallel LLM calls → ✅ answer in 4.2s Run GRPO (TRL) → model learns to write that search strategy itself 👀@lateinteraction @a1zhang
2
10
48
4,430
Kashif Rasul retweeted
Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy and… it's already supported in TRL, built by @krasul. you can really feel the pace of development in the team 🐎 paper by @onloglogn, @richard_baihe, @UnderGroundJeg, Navdeep Jaitly, @trebolloc, @YizheZhangNLP at Apple 🍎 how it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed you can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder): github.com/huggingface/trl/b… or benchmark a checkpoint with the eval script: github.com/huggingface/trl/b… one neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help want to dig deeper? paper: huggingface.co/papers/2604.0… trainer docs: huggingface.co/docs/trl/main…
6
36
227
37,732
Kashif Rasul retweeted
Today, we released PEFT v0.19.0 and it's a big one. Not only did we add 9 new PEFT methods, the release also contains a bunch of improvements to make PEFT more useful. Check the thread for details:
2
2
13
551
Kashif Rasul retweeted
France is about to pass a law punishing support for the genocide in Palestine! 🇫🇷🇵🇸 just kidding. it’s actually a proposal to restrict criticism of Israel, in the so-called country of human rights and free speech. @SandrineRunel, je vous appelle à voter contre la loi Yadan.
2
6
573
Kashif Rasul retweeted
check out this new notebook by @krasul on TimesFM 2.5, Google's time series foundation model which is now supported in transformers zero-shot forecasting, quantile predictions, LoRA fine-tuning, and forecasting with exogenous covariates colab.research.google.com/gi…
6
23
1,030
Kashif Rasul retweeted
Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into @huggingface Trainer, Accelerate and TRL For extensive details please see this writeup: huggingface.co/blog/ulysses-… Thanks a lot to @krasul for helping make it happen. Also the others in the HF team who helped with integration.
4
20
116
17,818
Kashif Rasul retweeted
12 Dec 2025
Just opened a PR to make continuous batching in transformers go EVEN faster🚆 With simple optimizations like no torch sync and more GPU-sided operations, we gained 10-14.5% throughput across 500 requests🥳 Soon, there will be native fast RL training in transformers. Keep up 😉
3
6
23
7,736
Kashif Rasul retweeted
In collaboration with @PyTorch team, we added transformers modeling backend to torchtitan library ! This means training any Dense model (MoE support coming soon) with torch.compile FSDPP/TP/PP/CP out of the box with no performance drop !
6
11
43
2,237
Kashif Rasul retweeted
21 Nov 2025
Ulysses Sequence Parallelism integration from Arctic Long Sequence Training has been merged into @huggingface HF Trainer. github.com/huggingface/trans… Thanks to @krasul and @_marcsun for help with integration and Weijie Zhang for being the first early adopter! There is also work being done on integration into HF trl.
4
26
1,662
Kashif Rasul retweeted
Reinforcement Learning for agents has been held back by a lack of standard infrastructure. Production agents don't live in clean "gyms"—they live in messy, async environments. Today we’re open-sourcing Eval Protocol: a framework to run RL directly on your production agents. Day 0 support for trainers and environments like TRL (@huggingface), rLLM (@Agentica_), OpenEnv (@PyTorch), as well as support for proprietary trainers like @OpenAI RFT and Tinker from @thinkymachines . 🧵
8
23
66
126,240
Kashif Rasul retweeted
On-policy distillation is a promising way to train small models, but it’s usually limited to teacher–student pairs sharing the same tokenizer. With our GOLD method, you can now distill across different model families and even outperform GRPO! huggingface.co/spaces/Huggin…
13
23
170
42,036
Kashif Rasul retweeted
Qwen released their new small and dense VLMs (Qwen3-VL). They're incredibly capable and one of my all-time favourite VLMs. 🤗 We’ve prepared some resources to help you get started. sharing in the next one
2
3
14
1,057
Updated the evaluation guidebook with a new deep dive! 2025 panorama of all the important and next level evaluations that you need to know to build *actually impactful and useful* models! (Assistant tasks, games, forecasting, and more) Tell me wyt! :) github.com/huggingface/evalu…
4
28
166
18,542
Kashif Rasul retweeted
Training long-context LLMs is getting easier! TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly 💆 Combine TRL and accelerate to run it effortlessly!
3
29
151
11,778
Kashif Rasul retweeted
🚀 Just shipped TRL v0.23 - train with *any* context length This release brings Context Parallelism which allow to train with arbitrary context length along with major improvements for post-training Here’s what’s new 🧵👇
1
20
115
9,809
Kashif Rasul retweeted
2 Sep 2025
Super excited to announce that our research team at @huggingface will be doing an AMA on r/LocalLLaMA. Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe there’ll be a shiny new release to talk about? Thursday 4th September, 8AM-11AM PST 🤗
14
30
158
37,733
Kashif Rasul retweeted
22 Aug 2025
Announcing geoai.js - GeoAI for the JavaScript community 🌍 Run AI models in the browser Node.js, powered by 🤗 transformers.js by @huggingface @geobaseapp Live demos → docs.geobase.app/geoai-live/… #gischat #javascript #geoai #transformersjs
4
10
56
21,035
Kashif Rasul retweeted
introducing qqWen: our fully open-sourced project (code weights data detailed technical report) for full-stack finetuning (pretrain SFT RL) a series of models (1.5b, 3b, 7b, 14b & 32b) for a niche financial programming language called Q All details below!
20
92
738
133,427
Kashif Rasul retweeted
🧑‍🍳 New Multimodal Fine-Tuning Recipe 🧑‍🍳 ⚡️ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.
4
33
186
9,925