ππ²π§ππ¦π¨ π°π’ππ‘ ππππππ‘π ππ π¦π¨ππ
We've updated the Dynamo integration to support LMCache's new multiprocess(MP) mode, complete with ready-to-run startup scripts. If you're serving with Dynamo, there's now a launch path for running LMCache as an out-of-process sidecar alongside the vLLM backend. Dynamo connects to the sidecar through LMCacheMPConnector, bringing the integration in line with LMCache's newer multiprocess architecture.
Huge thanks to @shaoting_feng for making this possible! Up next: disaggregated serving support for MP mode in Dynamo. Stay tuned! π
π Explore more: docs.nvidia.com/dynamo/dev/iβ¦#AI#inference#LMCache#KVCache
Join us at SIGCOMM 2025(conferences.sigcomm.org/sigcβ¦) for our full-day LMCache Tutorial β an intelligent caching middleware that makes LLM inference faster & cheaper!
π Sept 8, 2025
8:45 AM β 6:00 PM (Portugal Time / WEST)
= 12:45 AM β 10:00 AM (PDT)
What youβll learn:
πΉ KV-cache offloading & reuse for LLMs
πΉ Cutting GPU memory compute costs
πΉ Real-world integrations with vLLM & beyond
β Register here docs.google.com/forms/d/e/1Fβ¦#SIGCOMM2025#LMCache#LLM#vLLM
With RAG and agents becoming ubiquitous in LLM systems, tuning quality and performance JOINTLY is essential to achieve the best LLM quality-of-experience.
Our paper at SOSP this year, addresses this exact tradeoff!π₯
π The LMCache docs website are now live! π
Whether you're new to LLMs or a pro, our doc covers your need!
π Getting Started guides
π Small examples
π¨βπ» Code documentations
Boost your LLM deployment today!
Check our blogpost!
blog.lmcache.ai/2024-10-17-dβ¦