Home of the SGLang community. SGLang project sglang.io.

Joined May 2025
9 Photos and videos
Pinned Tweet
๐Ÿ‘‹ @sgl_project is back! Welcome to the home of the SGLang community! While @lmsysorg keeps you posted on technical drops and partner news, this space is for you! Here's what we've got lined up: ๐Ÿš€ Version releases: every new SGLang drop, unpacked ๐ŸŽ™๏ธ Office Hours: deep dives, live deployments, and team Q&A ๐Ÿ“บ Tutorials: short how-tos and the best Office Hour moments ๐ŸŒŸ Community spotlights: the cool stuff you're building with SGLang ๐Ÿ“… Event updates: meetups, workshops, and where to catch us next And we'd love to hear from you! What do you want to see? Benchmarks? Model deep dives? A topic for the next Office Hour? Drop it below ๐Ÿ‘‡ Every idea gets read!
1
9
51
11,477
๐ŸŽ‰ SGLang v0.5.13 is out! First, new model support! Nemotron 3 Ultra, Step-3.7-Flash, Command A , plus new diffusion models: Cosmos3, FLUX.2-Klein, Ideogram 4, LingBot-World, SANA-WM, and Ernie-Image. Here are the highlights for this release: - Speculative Decoding V2 is now the default! Tree drafting (topk>1) for faster generation - Breakable CUDA Graphs now make prefill faster - Qwen 3.5 runs faster on NVIDIA Blackwell with new GDN kernels - HiCache with UnifiedTree on by default for hybrid SWA/Mamba models - SGLang-Diffusion now supports realtime generation! Plus progressive resolution - Multiple performance and feature updates for DeepSeek V4 Thanks to our amazing partners and model makers: @NVIDIAAI @AMD @intel @awscloud @boson_ai @cohere @bfl_ml @ideogram_ai @deepseek_ai @Kimi_Moonshot @Alibaba_Qwen @StepFun_ai @Baidu_Inc @robbyant_brain
4
10
73
18,632
SGLang retweeted
๐Ÿš€New record on GB300 NVL72: SGLang exceeds 12K tok/s per GPU on DeepSeek V4 Pro 1.6T (FP4, 8K/1K), orchestrated with NVIDIA Dynamo (SGLang) and MTP. Per @SemiAnalysis_ InferenceX benchmarks, performance stays strong across the entire interactivity curve. More to come with @NVIDIAAIInfra ๐Ÿค
8
18
120
30,773
SGLang retweeted
SGLang Office Hour 6/11 Higgs Audio V3 TTS x.com/i/broadcasts/1nKOLLdYNโ€ฆ
1
3
697
Honor to co-host with @CrusoeAI!
Jun 11
We joined @HOFCapital, @CloudflareDev @ArklexAI to host SGLangโ€™s #NYTechWeek Happy Hour Lightning Talk exploring AI inference in finance. Our AI Research Manager Omri Berkovitch spoke on efficiently running large-scale inference in production. ๐Ÿ“ธ #AIinference #FinanceAI
10
472
SGLang retweeted
An absolutely impressive contribution to SGLang (@lmsysorg @sgl_project) by @BrianCChao, providing a 2ร— speedup with no loss in quality! In my opinion, there are three levels of contributions one can make to SGLang. The first is fixing bugs or extending existing features, such as fused kernels. The second is adding a new feature based on an existing paper, such as PD disaggregation. The third is inventing a new algorithm and integrating it into SGLang. This work clearly falls into the third category, and Iโ€™m excited to hear feedback from the community! Thanks a lot to the review from @mick_qian and supports from @ying11231 @BanghuaZ @lm_zheng
Aside from the official code release, I am thrilled to share that Spectral Progressive Diffusion a.k.a. SPEED (arxiv.org/abs/2605.18736) is now integrated into @lmsysorg's SGLang (@sgl_project)! ๐Ÿš€ Instead of always running diffusion at full resolution, SPEED progressively grows resolution across denoising steps, drastically cutting token count and achieving >2ร— speedup with no quality loss. SPEED is now supported in SGLang for FLUX.1 & 2, Z-Image, Qwen-Image, and Wan. Support for Ideogram 4 incoming. Try it out now: docs.sglang.io/docs/sglang-dโ€ฆ. [1/4]
3
8
87
11,413
Hot take: most of your denoising steps don't need full resolution๐Ÿ‘Œ SPEED proves it with progressive resolution growth during denoising, up to 2ร— speedup, and quality untouched. Now shipping in SGLang Diffusion: FLUX.1 & 2, Z-Image, Qwen-Image, Wan. Ideogram 4 next Congrats @BrianCChao @howard_xhc @YarivLior @GordonWetzstein ๐Ÿ‘ and thanks to the support from @hsu_byron and @mick_qian Don't forget to check out this doc: docs.sglang.io/docs/sglang-dโ€ฆ
Aside from the official code release, I am thrilled to share that Spectral Progressive Diffusion a.k.a. SPEED (arxiv.org/abs/2605.18736) is now integrated into @lmsysorg's SGLang (@sgl_project)! ๐Ÿš€ Instead of always running diffusion at full resolution, SPEED progressively grows resolution across denoising steps, drastically cutting token count and achieving >2ร— speedup with no quality loss. SPEED is now supported in SGLang for FLUX.1 & 2, Z-Image, Qwen-Image, and Wan. Support for Ideogram 4 incoming. Try it out now: docs.sglang.io/docs/sglang-dโ€ฆ. [1/4]
1
8
30
4,559
๐Ÿ“ฃ SGLang Office Hour: How to Build Fast, Modern Voice Cloning with @boson_ai Our Office Hour is back! This week we're joined by Ke and Huapeng from the Boson AI team to talk about Higgs Audio V3 TTS, how to build fast, modern voice cloning that actually sounds human, and their work optimizing SGLang-Omni for TTS production serving. Bring your questions for Ke and Huapeng, see you tomorrow!
1
3
18
2,442
Heard about Speculative Decoding? Come hear how Speculative Speculative Decoding (SSD) pushes inference speed to a new ceiling. RSVP ๐Ÿ‘‡
We're hosting a research talk lunch event with @TanishqKumar07 in SF! Come learn about SSD, which is up to 2x faster than the best inference engines in the world, RSVP below!
1
2
19
3,673
โ†’ Michelle Chen (@CloudflareDev) on AI platforms for agents
2
14
6,364
โ†’ Prof. Zhou Yu (Columbia / @ArklexAI) on agent evaluation in production
5
395
โ†’ Mao Cheng (@radixark) on Miles & RL
1
4
457
โ†’ Omri Berkovitch (@CrusoeDev) on inference infra at scale
1
1
6
349
๐Ÿ™๏ธย SGLang NY Tech Week Happy Hour Recap Last Wednesday, SGLang hosted a NY Tech Week Happy Hour in NYC, co-hosted with @HOFCapital, @Cloudflare, @CrusoeAI, and @ArklexAI. 380 registered, 200 in the room, and one unforgettable night. ๐Ÿงก The room was packed with engineers, researchers, and enthusiasts from quant funds, banks, and trading firms, all there to talk about one thing: where inference is headed as LLMs move into latency-sensitive production across trading, research, compliance, and risk. NYC, you showed up and brought the energy. We loved every minute. Until next time! โ˜€๏ธ #NYTechWeek @Techweek_
2
5
18
3,610
Huge thanks to our lightning talk speakers โ†’ Liangsheng Yin (@radixark) on SGLang for inference in finance
1
5
593
SGLang retweeted
๐Ÿ™ŒThe community account @sgl_project is back! If you're building with SGLang, this is your space. Follow for releases, Office Hours, tutorials & events๐Ÿ‘‡
๐Ÿ‘‹ @sgl_project is back! Welcome to the home of the SGLang community! While @lmsysorg keeps you posted on technical drops and partner news, this space is for you! Here's what we've got lined up: ๐Ÿš€ Version releases: every new SGLang drop, unpacked ๐ŸŽ™๏ธ Office Hours: deep dives, live deployments, and team Q&A ๐Ÿ“บ Tutorials: short how-tos and the best Office Hour moments ๐ŸŒŸ Community spotlights: the cool stuff you're building with SGLang ๐Ÿ“… Event updates: meetups, workshops, and where to catch us next And we'd love to hear from you! What do you want to see? Benchmarks? Model deep dives? A topic for the next Office Hour? Drop it below ๐Ÿ‘‡ Every idea gets read!
4
25
4,437
Sleeper hit no more ๐Ÿ˜„ 304M downloads and 744% in 3 months โ€” humbled to see SGLang growing alongside this incredible community. Thank you all for building with us!
PyPI May 2026: 163.8 billion downloads ( 9.2% MoM). The AI agent race is now measurable in download counts: โšก sglang: 304M ( 744% in 3 months), @lsyincs @sgl_project - the sleeper hit ๐Ÿ”€ litellm: 513M ( 425% in 3 months), @krrish_dh @ishaan_jaff @LiteLLM ๐Ÿ“ pydantic-ai-slim: 268M ( 85% MoM), @samuelcolvin @pydantic ๐ŸŒ langchain-protocol: 37M ( 524% in one month), @hwchase17 @LangChain ๐Ÿ•ท๏ธ browser-use: 24M ( 132% MoM), @mamagnus00 @gregpr07 @browser_use ๐Ÿค crewai: 14.5M ( 108% MoM), @joaomdmoura @crewAIInc boto3 still #1 at 3.3B. AWS wins the boring prize. Full data: clickhou.se/43USFbm
2
4
11
2,675
๐ŸŽ‰ Congrats on the launch! Exciting to see the agent infra space leveling up.
Today, we are launching GMI Agent Box. A complete infrastructure stack for production-ready AI agents: native Docker, flexible deployment, 200 models under one API key, dedicated compute across regions, and a marketplace for distribution. Available now.
2
17
1,041