Joined March 2011
2,573 Photos and videos
Hey Boston Friends. Join us next week during Boston Tech Week to learn more about llm-d, open source distributed inferencing on kubernetes. Special thanks to @RedHat and @Google for helping to plan and sponsoring this free event! luma.com/eqbc1gxq
2
3
105
Boston Friends! Come and join us for what's shaping up to be a great event! May 28th at 5pm - Google Cambridge office.
May 12
Boston AI Devs! ๐Ÿ™๏ธ Join the llm-d meetup on May 28 during Boston Tech Week. Hear the latest in LLMs from: ๐ŸŽ™๏ธ Tyler Michael Smith (@RedHatAI) ๐ŸŽ™๏ธ Sean Horgan (@Google) ๐ŸŽ™๏ธ Peter Tanski (@CapitalOne) Huge thanks to @Google for the support! ๐ŸŽŸ๏ธ Register: luma.com/eqbc1gxq
144
Pete Cheslock retweeted
๐Ÿ“ข ๐—ง๐—ต๐—ฒ ๐—ฆ๐˜๐—ฎ๐˜๐—ฒ ๐—ผ๐—ณ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ป๐—ด ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐—ถ๐—ฒ๐˜€: ๐—”๐—ฝ๐—ฟ๐—ถ๐—น ๐—˜๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ผ๐˜‚๐˜! Our goal with this newsletter is to give a clear, community-driven view of whatโ€™s happening across the model serving ecosystem, including updates from projects like @vllm_project, KServe, @_llm_d_, @kubernetesio, Llama Stack, and more. ๐Ÿ‘‰ Check out the April newsletter here: inferenceops.substack.com/p/โ€ฆ ๐Ÿ‘‰ Subscribe to get future issues in your inbox: inferenceops.substack.com/ ๐Ÿš€ Thanks to everyone who subscribed so far! Kudos to all contributors to this edition! Francisco Arceo, Pete Cheslock, Jooho Lee, Pierangelo Di Pilato, Nir Rozenbaum, Yuan Tang, Wentao Ye, Sasa Zelenovic

2
4
1,203
Pete Cheslock retweeted
vLLM meetup is coming to Boston on March 31! Workshop evening sessions covering: - @vllm_project update - Model compression and speculative decoding - Agentic AI with vLLM - Distributed inference at scale with @_llm_d_ and Kubernetes Pre-event workshop at 3:30 PM: Deploy Llama 3.1 8B and benchmark llm-d's cache-aware routing live. Shoutout to our sponsors: @RedHat, @IBM, @NVIDIAAI, The Open Accelerator, and @MITIBMLab! Register here ๐Ÿ‘‡ luma.com/4rmkrrb7
3
6
28
12,114
Pete Cheslock retweeted
Mar 26
Red Hat is working with industry leaders to develop llm-d, an open-source project that optimizes how models are served to your users. By routing requests to the most efficient GPU and separating prefill from decode, you get faster results for less spend. Check out Pete Cheslock's quick overview of how llm-d is changing the game for Kubernetes-based AI: red.ht/3PbTkkP #KubeCon #CloudNativeCon
3
8
26
2,143
Wondering what llm-d is? It's the open source project simplifying LLM deployment! Run any model on any accelerator, on any cloud. #llm-d #OpenSource #AI #Kubernetes #KubeCon
110
Pete Cheslock retweeted
Mar 24
Itโ€™s official: llm-d has joined the @CNCF! ๐Ÿš€ Our mission to evolve Kubernetes into SOTA AI infrastructure just got a massive boost. This milestone belongs to our amazing community. Thank you for building this with us. ๐Ÿ’œ Weโ€™re just getting started! ๐Ÿ”— cncf.io/blog/2026/03/24/welcโ€ฆ
2
39
143
9,992
For all my local Boston friends. If you are interested in vLLM/llm-d and inference at scale you should join us!
vLLM meetup is coming to Boston on March 31! Workshop evening sessions covering: - @vllm_project update - Model compression and speculative decoding - Agentic AI with vLLM - Distributed inference at scale with @_llm_d_ and Kubernetes Pre-event workshop at 3:30 PM: Deploy Llama 3.1 8B and benchmark llm-d's cache-aware routing live. Shoutout to our sponsors: @RedHat, @IBM, @NVIDIAAI, The Open Accelerator, and @MITIBMLab! Register here ๐Ÿ‘‡ luma.com/4rmkrrb7
2
135
Pete Cheslock retweeted
๐Ÿ“ข ๐—ง๐—ต๐—ฒ ๐—ฆ๐˜๐—ฎ๐˜๐—ฒ ๐—ผ๐—ณ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ป๐—ด ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐—ถ๐—ฒ๐˜€: ๐— ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—˜๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ผ๐˜‚๐˜! We launched our newsletter publicly last year to share our contributions to upstream communities from our @RedHat_AI teams. Weโ€™ve gained over ๐Ÿญ๐Ÿฏ๐Ÿฌ๐Ÿฌ ๐˜€๐˜‚๐—ฏ๐˜€๐—ฐ๐—ฟ๐—ถ๐—ฏ๐—ฒ๐—ฟ๐˜€! Our goal with this newsletter is to give a clear, community-driven view of whatโ€™s happening across the model serving ecosystem, including updates from @vllm_project, KServe, @_llm_d_, @kubernetesio, and Llama Stack. ๐Ÿ‘‰ Check out the March newsletter here: inferenceops.substack.com/p/โ€ฆ ๐Ÿ‘‰ Subscribe to get future issues in your inbox: inferenceops.substack.com/ ๐Ÿš€ Thanks to everyone who subscribed so far! Kudos to all contributors to this edition! @franciscojarceo, Pete Cheslock, Sean Condon, Jooho Lee, Pierangelo Di Pilato, Ran Pollak, Nir Rozenbaum, @TerryTangYuan, Wentao Ye

4
7
833
LFG!!!!
Injury Report Update: Jayson Tatum - AVAILABLE
117
DON'T TOY WITH MY EMOTIONS
Injury Report for tomorrow vs. DAL: Jayson Tatum - Right Achilles Repair - QUESTIONABLE
89
If you are in NYC next Wednesday, come and join us to learn how to scale AI Inference on Kubernetes with the llm-d project.
Whatโ€™s on the agenda for next Wednesday's NYC meetup? ๐Ÿ› ๏ธ Intro to llm-d 0.5 โšก๏ธ Distributed LLM serving on AMD ๐Ÿง  Lessons scaling Wide-EP and MoE ๐Ÿ’พ KV-cache offloading & prefix scheduling Join us building the future of open-source inference. Details: luma.com/0crwqwg4
108
Pete Cheslock retweeted
Whatโ€™s on the agenda for next Wednesday's NYC meetup? ๐Ÿ› ๏ธ Intro to llm-d 0.5 โšก๏ธ Distributed LLM serving on AMD ๐Ÿง  Lessons scaling Wide-EP and MoE ๐Ÿ’พ KV-cache offloading & prefix scheduling Join us building the future of open-source inference. Details: luma.com/0crwqwg4
2
6
668
Pete Cheslock retweeted
Join us next week in NYC with the llm-d community for a deep dive into distributed inference. Weโ€™re talking llm-d 0.5, scaling MoE models, and KV-cache offloading. If you're building LLM infra, don't miss this. ๐Ÿ“… March 11th ๐Ÿ“1 Madison Ave Register: luma.com/0crwqwg4
4
8
1,063
RT @TerryTangYuan: We'd like to announce that @kubernetesio WG Serving has succeeded and will be disbanded! Thank you everyone who have paโ€ฆ

3
9
Pete Cheslock retweeted
Great talk last night by @julianeagu (@QuotientAI), @thejackobrien (Subconscious), and @petecheslock (Red Hat)! LLMs as we know it today must change to meet the capacity we expect of them. Specialized agents, changing their hardware architecture, or funneling proper context!!
5
5
606