"Actually, we (vllm) get more users from the simple UX than vllm performance"
For our third guest, we welcome
@woosuk_k, co-founder & CTO of
@inferact and creator of
@vllm_project. To us, Woosuk is a unique guest, and we are amazed by the user-centric perspective on LLM inference he shared — from what makes the vLLM project successful, to new application scenarios to tailor inference to, and to how to support continual learning from user signals, and more.
0:00 - Prelude: Introducing Woosuk and Inferact
3:00 - Woosuk’s First PhD Project
6:00 - How the vLLM Project Got Started
9:18 - AI Infra Needs More Than Just Efficiency
14:08 - How AI Infra and Human-centered AI Are Connected
15:01 - How to Prioritize Feature Requests for Popular AI Infra
18:18 - Streaming Requests and Realtime API
24:05 - Multi-turn, Agentic, Proactive LLMs
27:03 - How to Design AI Infra in a Principled Way
29:13 - How to Design an AI Inference Engine for Continue Learning with RL
35:05 - Would LoRA Training Affect RL Infra Design?
37:28 - Why Start an AI Inference Infra Startup?
40:46 - What Effortless Inference with Open-source Models Means for Developers
43:46 - A Vision for On-device AI Inference
46:19 - Can Today’s Coding Agents Create vLLM?