Filter
Exclude
Time range
-
Near
๐Ÿ“ŒDify officially lists the Higress plugin: AI model integration solution. ๐ŸŒฑ Centralize multi-vendor access (text/image gen, embeddings) ๐ŸŒฑ Inherit gateway capabilities: traffic control, auth, monitoring ๐ŸŒฑProtocol support: OpenAI-compatible, Alibaba Cloud Bailian ๐ŸŒฑWorks with open-source Higress & cloud-native AI Gateway ๐ŸŒฑIncludes Agent & Workflow implementation guides Learn More: int.alibabacloud.com/m/10004โ€ฆ #AIGateway #ModelServing #Dify #Higress #AlibabaCloud #CloudComputing #AI #AInnovation #LLM
6
23
11,644
SecMLOps -arxiv.org/pdf/2601.10848 Integrating Security Throughout the Machine Learning Operations Lifecycle Secure Machine Learning Operations (SecMLOps), providing a comprehensive framework designed to integrate robust security measures throughout the entire ML operations (MLOps) lifecycle. SecMLOps builds on the principles of MLOps by embedding security considerations from the initial design phase through to deployment and continuous monitoring. This framework is particularly focused on safeguarding against sophisticated attacks that target various stages of the MLOps lifecycle, thereby enhancing the resilience and trustworthiness of ML applications. With the increasing concerns over ML security risks, the concept of Secure Machine Learning Operations (SecMLOps) was proposed to extend the MLOps with security considerations. This paradigm advocates for the explicit integration of security measures throughout the entire MLOps lifecycle. By embedding security considerations from the outset, SecMLOps aims to cultivate more secure, reliable, and trustworthy ML-based systems. This holistic security integration not only enhances the resilience of ML deployments but also ensures their alignment with organizational security policies and regulatory requirements, thereby fortifying the foundation of trust and dependability in ML applications across various sectors. Xinrui Zhang, Pincan Zhao, @JasonJaskolka, @henglli, Rongxing Lu - @Carleton_U, @polymtl, @queensu #SecMLOps #MLOps #MachineLearningSecurity #AdversarialMachineLearning #AdversarialExamples #DataPoisoning #STRIDE #ThreatModeling #AdversarialTraining #ModelServing #CityPersons #PedestrianDetection
1
21
949
25 Nov 2025
๐Ÿ”ฅ 5090 โ€” The Best Price-to-Performance Inference GPU! Black Friday Special ๐Ÿ”ฅ Experience data-centerโ€“grade performance with the 5090 โ€” built for safe, stable, dedicated inference at scale. Original Price: $0.70/h โœจ Black Friday Price: ONLY $0.45/h! โœจ Need high-performance inference? Deploying large models? Running real-time AI services? The 5090 gives you unmatched value and power. #AI #AIGPU #AICompute #GPUCloud #CloudInference #LLMInference #BlackFridayDeals #BlackFridaySale #ComputeSale #GPUSale #5090GPU #HighPerformanceGPU #HPC #DeepLearning #MachineLearning #NeuralNetworks #AIDeployment #AIPipeline #ModelServing #AIInfrastructure #ScalableInference #DedicatedCompute #DataCenterGPU #ServerGradeGPU #BestValue #ComputePower #AIStartup #MLOps #AITools #NextGenAI #TechDeals
1
1,035
Just wrapped up an incredible experience at @databricks DevConnect Bangalore, a day full of deep dives, future-ready ideas, and powerful conversations around data & AI! #Databricks #DevConnect #ModelServing #CostOptimization #DataEngineering #BangaloreTech #AIInfrastructure
2
120
Kicking off a focused deep dive into LLM inferencing - exploring optimization, quantization, and deployment pipelines. If youโ€™ve read any great papers/blogs/videos or tried interesting frameworks lately, drop them here ๐Ÿ‘‡ #LLM #AI #ML #DeepLearning #MLOps #Inference #GenAI #AIResearch #AIInfra #ModelServing
2
206
๐ŸŽ‰ ๐—ž๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ ๐—ต๐—ฎ๐˜€ ๐—ฏ๐—ฒ๐—ฒ๐—ป ๐—ฎ๐—ฐ๐—ฐ๐—ฒ๐—ฝ๐˜๐—ฒ๐—ฑ ๐—ฎ๐˜€ ๐—ฎ ๐—–๐—ก๐—–๐—™ ๐—œ๐—ป๐—ฐ๐˜‚๐—ฏ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฃ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜! ๐Ÿš€ Just a short time ago, we shared that our application was under public comment. Now, the vote has officially passed and KServe is joining the @CloudNativeFdn as an incubating project. This is a major milestone for the community and a big step forward for cloud-native model serving. ๐Ÿ™Œ Huge thanks to everyone who contributed to this journey from writing code, reviewing docs, to supporting governance and community growth. Stay tuned! Weโ€™ll be publishing a detailed announcement blog soon with more insights on what this means for users, contributors, and the future of model serving on @kubernetesio. For now: thank you to the community for making this possible. ๐Ÿ’™ #KServe #CNCF #OpenSource #ModelServing #AI #MLOps #CloudNative #Kubeflow #Kubernetes #k8s @kubeflow
1
5
31
2,592
๐Ÿš€ ๐— ๐—ฎ๐—ท๐—ผ๐—ฟ ๐—จ๐—ฝ๐—ฑ๐—ฎ๐˜๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐—ž๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ! ๐Ÿš€ Weโ€™ve officially taken the next step toward joining the ๐—–๐—ก๐—–๐—™ ๐—ฎ๐˜€ ๐—ฎ๐—ป ๐—œ๐—ป๐—ฐ๐˜‚๐—ฏ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฃ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜! ๐ŸŽ‰ The CNCF TOCs have completed due diligence and are now sponsoring our application. The process is entering the ๐—ฝ๐˜‚๐—ฏ๐—น๐—ถ๐—ฐ ๐—ฐ๐—ผ๐—บ๐—บ๐—ฒ๐—ป๐˜ ๐—ฝ๐—ฒ๐—ฟ๐—ถ๐—ผ๐—ฑ (2 weeks) before moving to a formal TOC vote. Application: github.com/cncf/toc/issues/1โ€ฆ Due Diligence doc: github.com/cncf/toc/pull/186โ€ฆ A huge thank you to @kevinwzf0126 and @FaseelaDilshan from the CNCF TOC for all the hard work. Itโ€™s been such a pleasure collaborating with you both on this milestone. Thank you to all the community members who have contributed! This is a big step for the KServe community, and weโ€™re excited about the road ahead in making cloud-native model serving more accessible and production-ready for everyone. #KServe #CNCF #OpenSource #ModelServing #AI #MLOps #CloudNative @CloudNativeFdn @kubernetesio @kubeflow
1
6
27
2,807
2 Sep 2025
Detecting Exposed LLM Servers: A Shodan Case Study on Ollama - blogs.cisco.com/security/detโ€ฆ by @TalosSecurity This work investigates the prevalence and security posture of publicly accessible LLM servers, with a focus on instances utilizing the Ollama framework, which has gained popularity for its ease of use and local deployment capabilities. While Ollama enables flexible experimentation and local model execution, its deployment defaults and documentation do not explicitly emphasize security best practices, making it a compelling target for analysis. #AISecurity #LLMSecurity #Ollama #Shodan #AttackSurfaceManagement #APISecurity #MLOps #DevSecOps #ModelServing #CloudSecurity
2
13
380
18 Jul 2025
๐Ÿš€Excited to see our collaboration with @lmsysorg bring Multiple Token Prediction (MTP) in SGLang to production! Proud to support faster, smarter open-source LLM serving. #EigenAl #MTP #SGLang #LLMinfra #ModelServing #DeepSeek #OpenSourceAl #AskChatGPT
18 Jul 2025
๐Ÿš€ Summer Fest Day 5: Multiple Token Prediction in SGLang by @Eigen_AI_ and SGLang Team 1.6ร— throughput, same quality โ€” open-source & production-ready! Weโ€™ve integrated MTP into SGLang, unlocking up to 60% higher output throughput for models like DeepSeek V3, with zero quality trade-offs. Key highlights: - Plug-and-play MTP for any SGLang-served LLM - Works with Expert Parallelism, PD disaggregation & CUDA Graph - Draft-then-verify decoding with full model consistency - 1.6ร— boost in small clusters, 14% at scale - Easy tuning via draft_token_num; monitor acceptance length for max gains Serving LLMs at scale? Donโ€™t leave performance on the table๐Ÿ‘‡ #SGLang #MTP #LLMInfra #ModelServing #DeepSeek #OpenSourceAI #AIInfrastructure #EigenAI
4
10
1,795
18 Jul 2025
๐Ÿš€ Summer Fest Day 5: Multiple Token Prediction in SGLang by @Eigen_AI_ and SGLang Team 1.6ร— throughput, same quality โ€” open-source & production-ready! Weโ€™ve integrated MTP into SGLang, unlocking up to 60% higher output throughput for models like DeepSeek V3, with zero quality trade-offs. Key highlights: - Plug-and-play MTP for any SGLang-served LLM - Works with Expert Parallelism, PD disaggregation & CUDA Graph - Draft-then-verify decoding with full model consistency - 1.6ร— boost in small clusters, 14% at scale - Easy tuning via draft_token_num; monitor acceptance length for max gains Serving LLMs at scale? Donโ€™t leave performance on the table๐Ÿ‘‡ #SGLang #MTP #LLMInfra #ModelServing #DeepSeek #OpenSourceAI #AIInfrastructure #EigenAI
3
8
33
5,666
14 Jul 2025
๐Ÿš€Summer Fest Day 3: Cost-Effective MoE Inference on CPU from Intel PyTorch team Deploying 671B DeepSeek R1 with zero GPUs? SGLang now supports high-performance CPU-only inference on Intel Xeon 6โ€”enabling billion-scale MoE models like DeepSeek to run on commodity CPU servers. Key highlights: 1. Full CPU backend for SGLang with Intel AMX 2. Native BF16 / INT8 / FP8 support for both Dense and Sparse FFNs 3. 6โ€“14ร— TTFT and 2โ€“4ร— TPOT speedup vs. llama.cpp 4. 85% memory bandwidth efficiency with optimized MoE kernels 5. Flash Attention V2 MLA MoE all optimized for CPU 6. Multi-NUMA parallelism mapped from GPU-style Tensor Parallelism This work is now fully upstreamed to SGLang mainโ€”read how we achieved it, and how far you can go without a GPU ๐Ÿ‘‡ #LLMInfra #ModelServing #MoE #Xeon6 #SGLang #FP8 #INT8 #CPUInference
6
15
38
19,174
8 Jul 2025
๐ŸšจDay 1: OME blog from the Oracle team OME (Open Model Engine) redefines LLM deployment infrastructure with a model-driven architecture. No more complex YAMLs or deployment guessworkโ€”OME treats models as first-class Kubernetes resources, enabling: โ€ข Multi-node multi-phase serving โ€ข Decode/prefill disaggregation โ€ข Serverless autoscaling via tokens/sec or KV cache โ€ข One-line deployment for billion-scale models See how OCI GenAI teams cut model onboarding from months to days ๐Ÿ‘‡ #LLMInfra #MLOps #Kubernetes #ModelServing #OCI #SGLang
8 Jul 2025
SGLang Summer Fest kicks off! ๐Ÿค—From July 7โ€“18, weโ€™re dropping a series of deep-dive blogs on all things SGLang and its fast-growing ecosystem โ€” from blazing-fast inference to production deployment, RL integration & beyond. Donโ€™t miss a post! Follow & RT to join the fest!
2
5
31
12,489
๐Ÿš€ Introducing #LoRAX: Efficient Multi-LoRA Serving on Amazon Web Services (AWS)! Discover how LoRAX, our #OpenSource inference software, enables concurrent serving of multiple #LoRA adapters on a single LLM instance with this new blog from our partners at AWS. ๐–๐ก๐ฒ ๐๐จ๐ž๐ฌ ๐ญ๐ก๐ข๐ฌ ๐ฆ๐š๐ญ๐ญ๐ž๐ซ? Using LoRAX on AWS significantly reduces your infra costs and improves scalability for #GenAI applications.โ€‹ ๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ญ๐ก๐ž ๐›๐ฅ๐จ๐ : ๐Ÿ”„ Dynamic #Adapter Swapping: Serve 100s of fine-tuned models without the need for separate instances. ๐Ÿ’ฐ Cost #Efficiency: Reduce hosting costs by up to 80% by consolidating model serving. โœˆ๏ธ Simplified Deployment: Utilize AWS and Predibase for streamlined model and adapter management. Learn how LoRAX can transform your AI infrastructure with AWS: aws.amazon.com/blogs/machineโ€ฆ #OpenSource #MLOps #ModelServing
2
4
354
๐Ÿงจ DeepSeek is everywhereโ€”except in production. Letโ€™s be real: every AI teamโ€™s poked at DeepSeek-R1. But almost no oneโ€™s using it for real work. We surveyed 500 AI professionals and found: ๐Ÿ“Š 57% have tested it ๐Ÿ˜ฌ 3% have deployed it ๐Ÿคท 47% still donโ€™t know if itโ€™s better than other models Itโ€™s the classic GenAI problem: ๐Ÿ’ก Big promise ๐Ÿงฑ Bigger friction โณ Still waiting on proof And yet... ๐Ÿ”ง 46% of teams want to customize it with LoRA or distill it down ๐Ÿ’ผ Most traction? Specialized use cases (not chatbots, sorry) The bottom line: Teams want to believe in DeepSeek. But without benchmarks or tools to make it usable, theyโ€™re stuck guessing. Wanna be the team that figures it out first? ๐Ÿ‘‡ Fine-tune it. Deploy it. Own it. Free trial at Predibase. #LLMs #OpenSourceAI #DeepSeek #InfraMatters #InferenceStack #GenAI #LoRA #Distillation #MLOps #ModelServing #Predibase
1
5
295
Did you about about some @OpenledgerHQ Use case?? Lets discuss some of them. Do you Want to deploy multiple AI models without melting your GPU budget????? Enter OpenLora by OpenLedger: a scalable framework that lets you serve thousands of fine tuned models ,all on a single GPU! Serious gains for devs & AI startups. #AI #OpenLora #OpenLedger ๐Ÿ’ก Imagine spinning up a whole army of customizable AI assistants each with their own personality, skills, or domain focus without spinning up a whole fleet of servers. Thatโ€™s the LoRA adapters magic! #AIassistants #LoRA ๐Ÿ’ธ Cloud costs bleeding you dry? OpenLora is a cost-effective solution,efficient memory usage means less hardware, smaller bills, and way more scale. Get enterprise level AI serving on a startup budget! #AICosts #CloudComputing Personalization FTW! With OpenLora, users can fine tune their own adapters and deploy models tailored to their unique needs think bespoke chatbots ,custom code assistants, or industry specific NLP tools. #FineTuning #PersonalizedAI ๐Ÿค– Devs, this is for you: OpenLora isnโ€™t just about scale, itโ€™s about flexibility. Deploy, update, and manage specialized models (in chat, code, or any NLP task) without the heavy infra overhead. #AIForDevelopers #ModelServing TL;DR: OpenLora from OpenLedger = scalable, personalized, affordable AI model deployment. Perfect for companies, startups, and devs who want to do more with less. Dive into the details: [openledger.gitbook.io/openleโ€ฆ](openledger.gitbook.io/openleโ€ฆ) ๐ŸŒโœจ
3
4
72
Awesome Production Machine Learning curates production-ready ML tools across the entire ML lifecycle. It covers crucial MLOps components - from model serving to monitoring infrastructure. The repo is particularly strong in documenting tools for deployment pipelines, feature stores, and model monitoring solutions. You'll find battle-tested frameworks for distributed training, model versioning, and security hardening. Think of it as your technical compass for building enterprise ML systems. Most tools listed support RESTful APIs and container-based deployments, perfect for microservices architectures. github: /EthicalML/awesome-production-machine-learning #MLOps #MachineLearning #DevOps #ModelServing #FeatureStore #ModelMonitoring #DataEngineering #ProductionML
1
2
167
Awesome LLMOps is a well-organized knowledge base covering the entire ML lifecycle, from model training to production deployment. The repo acts as a curated index of tools spanning model serving, security, observability, and distributed training frameworks. It focuses on production-grade LLM infrastructure - think vector DBs, model quantization, and serving optimizations. github: /tensorchord/Awesome-LLMOps #LLMOps #MLOps #AI #MachineLearning #DevOps #ProductionML #ModelServing
3
101
This is a curated knowledge base for LLM resources, organizing content across critical ML engineering domains like data processing, fine-tuning, inference, and RAG architectures. Key aspects: - Structured sections for MLOps lifecycle stages - Focus on production-ready tools for data preparation and model deployment - RAG and agent implementation patterns - Evaluation frameworks and metrics - Integration examples with popular LLM APIs It also has a comprehensive coverage of data processing pipelines (via tools like data-juicer) and detailed fine-tuning approaches. The repository emphasizes practical engineering over theory. github: /WangRongsheng/awesome-LLM-resourses #MachineLearning #LLM #MLOps #RAG #DataEngineering #ModelServing #FineTuning
3
193
7 Nov 2024
๐ŸŒŸ Mastering Generative AI: Building and Serving Models with Databricks ๐ŸŒŸ Join @learnwithabi from @databricks to discover how to harness Databricks for developing, deploying, and serving powerful Generative AI models. This session will guide you through streamlining machine learning workflows with Databricks Model Serving, leveraging scalable infrastructure for model training, and using MLflow to manage and deploy models at scale. Perfect for those focused on natural language generation, synthetic data, or AI-driven content, this talk will equip you with practical insights to bring AI solutions to production seamlessly. Register Now: techxconf.com #TechXConf #GenerativeAI #Databricks #ModelServing #MachineLearning #AIProduction #MLflow #TechXConf2024
1
4
89