Joined June 2026
Photos and videos
Yubo Wang retweeted
Everyone talks about 1M context. The harder part is making 1M context actually usable. Serving MiniMax M3 required optimizing for long-context, multimodal, and agentic workloads simultaneously. Excited to see what developers build with it. 🚀
3
5
40
6,198
First technical Deepdive on M3 on the internet😎
MiniMax-M3 combines 1M context, native multimodality, and MiniMax Sparse Attention. The next layer is serving it efficiently: KV-block-major sparse attention, paged MSA decode, optimized index scoring, and multimodal preprocessing before the GPU worker. Together’s Inference and Kernel teams improved throughput by 81–125% across common agentic-shape traffic. We go deeper in this deep dive from @ywangfirstlean, @zhyncs42, @realDanFu and the team.
1
3
3,594
Yubo Wang retweeted
MiniMax-M3 combines 1M context, native multimodality, and MiniMax Sparse Attention. The next layer is serving it efficiently: KV-block-major sparse attention, paged MSA decode, optimized index scoring, and multimodal preprocessing before the GPU worker. Together’s Inference and Kernel teams improved throughput by 81–125% across common agentic-shape traffic. We go deeper in this deep dive from @ywangfirstlean, @zhyncs42, @realDanFu and the team.
1
10
30
10,266
Yubo Wang retweeted
🚀TorchSpec has been live for 2 weeks — and kimi-k2.5-eagle3 just hit 40K downloads on HuggingFace! Thanks to @KT_Project_AI Team and @vllm_project Team for the amazing collaboration. Links in comments.
2
9
41
1,106,440
Yubo Wang retweeted
See you tomorrow night. Come with questions.
We're going LIVE tomorrow with @togethercompute 🔥. @zpysky1125 is pulling back the curtain on M3: sparse attention, 1M context, all of it. You don't want to miss this.
1
2
21
5,529