What if an LLM could EDIT its own tokens in real-time, not just generate them? 🤯
Introducing LLaDA2.1 — a diffusion model that breaks from autoregressive dominance. It drafts fast, then fixes its own mistakes on the fly with Token-to-Token editing.
The result? 892 tokens/sec on a 100B model. 🔥
âš¡ 892 TPS on HumanEval (coding)
âš¡ 801 TPS on BigCodeBench
🧠Real-time self-correction via T2T editing
✅
@lmsysorg SGLang Day 0 support — production-ready now
A "non-consensus" architecture now challenging the mainstream. Open-sourced TODAY. 👇
#LLaDA #TokenEditing #OpenSource #LLM #dLLM