NICE Talk 141πinvites Ph.D. at Georgia Tech Hao Kang
@GT_HaoKang to discuss ThunderAgent: 4Γ Faster LLM Agent Inference!
Time
β° PST 3.07 18:00β19:00
β° EST 3.07 21:00β22:00
β° Beijing 3.08 10:00β11:00
Watch live:
youtube.com/live/kHw6LZsXcH0
Register:
luma.com/ezn8ho93
In this talk, the speaker will talk about:
π How can we make LLM agent workflows faster, simpler, and more robust?
β Traditional request-level engines (vLLM, SGLang) struggle with KV cache thrashing, memory imbalance, and resource leaks.
β
ThunderAgent introduces Program Abstraction, treating multi-step agent workflows as programs, unifying GPU, CPU, and remote tool scheduling.
With just two lines of code, ThunderAgent boosts inference throughput by 1.5β3.6Γ, rollout throughput by 1.8β3.9Γ, and saves 4.2Γ disk space, while ensuring high concurrency stability.
Join us to explore a principled, program-level approach to distributed agent inference and RL rollouts.
#AI #LLM #AgenticAI #ReinforcementLearning #DistributedSystems #ProgramAbstraction #ThunderAgent