DevOps, MLOps, and Cloud tips & Deep Dives

Joined November 2014
837 Photos and videos
Pinned Tweet
40% OFF Linux Foundation Tech Talent sale๐Ÿš€ Use code TECHTALENT26CT at kube.promo/cyber to get flat 40% discount on Individual certifications like CKA, CKAD etc. Using code TECHTALENT26CT you can save up to 45% on the following k8s certification bundles. - CKA CKAD: kube.promo/cka-ckad - CKA CKS Bundle: kube.promo/bundle - CKA CKAD CKS Exam bundle: kube.promo/k8s-bundle - KCNA CKA: kube.promo/kcka-bundle - KCSA CKS Exam Bundle: kube.promo/kcsa-cks - KCNA KCSA Exam Bundle: kube.promo/kcna-kcsa - KCNA KCSA CKA CKAD CKS: kube.promo/kubestronaut โ™ป๏ธ P.S. Donโ€™t forget to repost and share it with the DevOps community and your friends! #kubernetes #devops #cka #ckad #cks
2
12
469
DevOps Tool of the Week: AirLLM ๐Ÿ› ๏ธ You don't need a powerful GPU to run large models locally. AirLLM is an open-source tool that lets you run large models on a single GPU without needing the whole model in VRAM at once. It splits the model into layers and loads one layer at a time during inference. Here is what it does ๐Ÿ‘‡ - Initially, AirLLM pulls the full model, splits it into per-layer shards, and saves on local disk. - When you send a query, it loads the model into VRAM one layer at a time. - It loads Layer 1 from disk into GPU VRAM, processes it, then clears VRAM and calls the next layer. - It repeats this for every layer, and once all layers are processed, it returns the response to your query. With this, even a 4GB GPU can run a 70B model. ๐—ฆ๐˜๐—ฎ๐—ฟ๐˜ ๐—›๐—ฒ๐—ฟ๐—ฒ: github.com/lyogavin/airllm #LLM #MLOps #devopstools
2
11
57
2,413
VPN and mTLS are essential and must-know concepts for DevOps engineers This guide aims to teach you the following practically. - Client to Site VPN - Client and Server Certificates Generation using EasyRSA - Mutual TLS Authentication (mTLS) So, why is it important to learn these? Every organization uses VPN connectivity to securely connect to cloud resources. The best way to understand VPN & mTLS is by setting up one. ๐——๐—ฒ๐˜๐—ฎ๐—ถ๐—น๐—ฒ๐—ฑ ๐—š๐˜‚๐—ถ๐—ฑ๐—ฒ: devopscube.com/aws-client-vpโ€ฆ The following topics are covered in a theoretical format: - SAML/Active Directory-based authentication. - TCP vs UDP Do further research and gain a detailed understanding of the above topics. They are fundamental for DevOps engineers PS: โ™ป๏ธ Repost if you find this useful. It helps the Learning community ๐Ÿ™‚ Got any tips or info to share with the community? Drop a comment :) #devops #aws #practicaldevops
1
11
67
1,976
DevopsCube retweeted
This Kubernetes feature changes how containers handle root privileges ๐Ÿ‘‡ User Namespaces is security feature that maps user and group IDs inside a container to a different set of IDs on the host (Rootless Isolation) Here is the key idea. A process running as root (UID 0) inside the container runs an unprivileged user (eg, UID 100000) on the host. Without user namespaces, when a container runs as root uses the same root identity as the host. So If an attacker breaks out of the container, they get full host privileges. I have published a hands-on guide convering the following. - What are User Namespaces - How kernel does the root to non-root user mapping - What changes when you set hostUsers: false - Limiting host UID ranges using /etc/subuid file - Hands on deployment to understand User Namespace - User Namespace limitations ๐—ฅ๐—ฒ๐—ฎ๐—ฑ ๐—ถ๐˜ ๐—›๐—ฒ๐—ฟ๐—ฒ: newsletter.devopscube.com/p/โ€ฆ Have you tried this feature it? โ™ป๏ธ PS: Repost and share this with the community! #devops #kubernetes
38
161
4,462
Kubernetes NodeLocal DNSCache Explained ๐Ÿš€ When it comes to performance, Every DNS lookup in Kubernetes matters. Without NodeLocal DNSCache, Pods send DNS queries to the kube-dns/CoreDNS Service IP. These requests go through kube-proxy, DNAT rules, and conntrack before reaching CoreDNS. In busy clusters, this can add latency and increase pressure on the conntrack table. NodeLocal DNSCache solves this by running a local DNS cache on every node as a DaemonSet. So, instead of talking to CoreDNS directly, Pods send DNS queries to the local cache on the same node. Here are its key benefits, - It Reduces average DNS lookup time as the DNS queries are resolved locally using DNS cache - It reduces load on CoreDNS - It prevents conntrack table exhaustion as Connections from Pods to their local cache don't create conntrack table entries - DNS queries for external URLs fcan be forwarded directly without involving CoreDNS We share deep dives on Kubernetes, DevOps, MLOps, Cloud and GitOps โ†’ ๐—๐—ผ๐—ถ๐—ป ๐—›๐—ฒ๐—ฟ๐—ฒ (๐Ÿญ๐Ÿฌ๐Ÿฌ% ๐—ณ๐—ฟ๐—ฒ๐—ฒ): newsletter.devopscube.com/suโ€ฆ โ™ป๏ธ PS: Repost and share it with the DevOps community. Note: NodeLocal DNSCache it is not enabled by default in Kubernetes. It must be deployed manually as a DaemonSet (except on managed platforms like GKE Autopilot where it's now default). Got any tips? โฌ‡๏ธ Share and discuss in the comments below! โฌ‡๏ธ #devops #kubernetes
7
44
967
DevOps Tool of the Week: Dragonfly ๐Ÿ› ๏ธ Pulling a 130 GB model to 200 GPU nodes generates 26 TB of traffic. But what if nodes pulled from each other instead of the source? That is where Dragonfly helps. It is an open-source peer-to-peer file distribution system. The initial downloading node becomes a local cache. The model hub is hit once, and the rest of the cluster pulls from each other. Here is what it does ๐Ÿ‘‡ - Supports direct model downloads from Hugging Face and ModelScope. - Split files and start sharing peer-to-peer before the first download finishes. - Reduce origin bandwidth from 26 TB to ~130 GB across 200 nodes. - Supports private repos with token authentication. - Deploys on Kubernetes as a DaemonSet. ๐Ÿ‘‰๐—š๐—ถ๐˜๐—ต๐˜‚๐—ฏ ๐—ฅ๐—ฒ๐—ฝ๐—ผ: github.com/dragonflyoss/dragโ€ฆ #devopstools #devops
4
24
1,035
Kubernetes Secret: Data vs. StringData ๐Ÿš€ When you create a k8s secret, You can specify secret data in two different fields: - Data - StringData What is the difference? If you specify the Data field, the secret values must be base64-encoded (pre-encoded). However, if you want to use plain text in the manifest, you can use the StringData field. When you create a secret using StringData, it automatically gets converted to Data and stored in the data field as Base64-encoded value. So the purpose of stringData is simply developer convenience during secret creation and updates. The actual Secret object always stores data in the data field Additionally, you can use both StringData and Data in the same manifest. For duplicate keys, StringData takes precedence over Data. For example, if you specify username in both Data and StringData, the key from StringData takes precedence, and the key in Data is ignored. However, if you have distinct keys, all of them will be available in the created secret. If you have any insights or tips to share, do share them in the comments below! We share deep dives on Kubernetes, DevOps,MLOps, Cloud and GitOps โ†’ ๐—๐—ผ๐—ถ๐—ป ๐—›๐—ฒ๐—ฟ๐—ฒ (๐Ÿญ๐Ÿฌ๐Ÿฌ% ๐—ณ๐—ฟ๐—ฒ๐—ฒ): newsletter.devopscube.com/suโ€ฆ โ™ป๏ธ PS: Repost and share it with the DevOps community. --- โš ๏ธ Important note: Base64 is encoding, not encryption. Anyone with access to the secret manifest or etcd can decode it in seconds. It offers zero security. For proper secret management, use dedicated tools like: - HashiCorp Vault - AWS Secrets Manager / Azure Key Vault - Sealed Secrets Also, enable encryption at rest for etcd and use RBAC to restrict secret access. โฌ‡๏ธ Discuss in the comments below! โฌ‡๏ธ #devops #kubernetes
1
9
39
1,071
Kubernetes Interview Question ๐Ÿš€ How to fix conntrack exhaustion in Kubernetes? Conntrack is one of the most critical and overlooked parts of Kubernetes networking. The conntrack table on the nodes has a maximum size. Kubernetes generates huge amounts of NAT traffic because of ClusterIP Services, kube-proxy iptables mode, readiness/liveness probes, service mesh traffic etc.. So, in a busy cluster with hundreds of pods making thousands of connections, that table fills up fast. Many Linux systems default to values around 131072 entries, though the actual value depends on kernel and system memory. If conntrack is full, the following happens. - Random connection timeouts - Intermittent DNS failures - API calls that fail with no clear error - Works sometimes, fails sometimes behavior - Services that appear healthy but connections randomly drop There are several ways to mitigate and prevent the issue. We have explained all the solutions in our detailed conntrack guide. ๐—ฅ๐—ฒ๐—ฎ๐—ฑ ๐—ถ๐˜ ๐—›๐—ฒ๐—ฟ๐—ฒ: newsletter.devopscube.com/p/โ€ฆ โ™ป๏ธ If you find it useful, share it with fellow DevOps and Cloud Engineers. #devops #kubernetes
1
7
53
1,509
DevOpsCube Weekly: Edition #2 is out โ˜• Conntrack In Kubernetes, AgentGateway, GitHub OIDC on AWS, Remote Jobs & More.. ๐—ฅ๐—ฒ๐—ฎ๐—ฑ ๐—ถ๐˜ ๐—›๐—ฒ๐—ฟ๐—ฒ: newsletter.devopscube.com/p/โ€ฆ โ™ป๏ธ If you find it useful, share it with fellow DevOps and Cloud Engineers. #devops
4
9
684
DevOps Tool of the Week: Agentgateway ๐Ÿ› ๏ธ As AI agents become part of production systems, managing how they connect to LLMs, tools, APIs, and MCP servers becomes difficult. Agentgateway solves this by acting as a central gateway for AI agents. Here is what it does ๐Ÿ‘‡ - Routes traffic between agents, tools, APIs, and LLMs. - Provides a single endpoint for MCP servers and agent communication. - Adds authentication, security policies, and rate limiting. - Gives visibility into agent requests, failures, and usage. Think of it as an API Gateway for Agentic AI systems. ๐Ÿ‘‰๐—š๐—ถ๐˜๐—ต๐˜‚๐—ฏ ๐—ฅ๐—ฒ๐—ฝ๐—ผ: github.com/agentgateway/agenโ€ฆ #devops #aiops #mlops
3
3
20
995
Free Claude Courses ๐Ÿš€ If you are a DevOps engineer trying to understand how AI fits into infrastructure and automation these Claude courses are worth checking out. Anthropic has structured learning paths around: - AI Fluency and prompt engineering - Claude API development - Claude Code for engineering workflows - MCP (Model Context Protocol) - AI agent workflows and integrations ๐Ÿ‘‰ ๐—ฆ๐˜๐—ฎ๐—ฟ๐˜ ๐—›๐—ฒ๐—ฟ๐—ฒ: claude.com/resources/courses โ™ป๏ธ P.S. Dont forget to repost and share with the DevOps community. #devops #aiops #PlatformEngineering
1
2
7
522
How Conntrack Powers Kubernetes Services ๐Ÿš€ In this guide, You will understand conntrack through real Kubernetes networking scenarios and see why it plays a critical role behind Kubernetes Services, kube-proxy, NAT, and DNS traffic. You will learn, - What conntrack is and why it exists - Why Kubernetes Services depend on it - How to inspect the conntrack table - What happens when the table gets full - How to troubleshoot and fix conntrack exhaustion in production ๐——๐—ฒ๐˜๐—ฎ๐—ถ๐—น๐—ฒ๐—ฑ ๐—š๐˜‚๐—ถ๐—ฑ๐—ฒ: newsletter.devopscube.com/p/โ€ฆ โ™ป๏ธ P.S. Repost and share with the DevOps community. #devops #kubernetes
6
24
830
CKA Exam Guide With Study Notes ๐Ÿš€ As per the community request, we have added study notes section with illustrations. It covers key areas like Gateway API, Network Policies, Cluster administration etc.. ๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ ๐—ฅ๐—ฒ๐—ฝ๐—ผ: github.com/techiescamp/cka-cโ€ฆ Bookmark the repo and start your CKA prep this week. Found it useful? Star the repo โญ and share it with someone prepping for CKA. Got questions on any CKA topic? Drop them in the comments ๐Ÿ‘‡ #cka #kubernetes
13
47
1,323
What is NOTES.txt in Helm Chart? In this guide, you will learn about: - What is Helm Notes.txt - How it works and its use cases - How to view notes before and after chart installation. - Simple hands-on guide to show the Notes.txt functionality. - How to suppress notes output Detailed Blog: devopscube.com/helm-notes-txโ€ฆ โ™ป๏ธ P.S. Repost and share with the DevOps community.
6
20
814
GitHub Actions AWS Without Long-Lived Secrets? ๐Ÿš€ Modern DevOps pipelines should not depend on hardcoded AWS access keys. Because leaked CI/CD credentials are still one of the biggest cloud security risks. That is where OIDC helps. Instead of storing AWS secrets in GitHub, your workflow gets short-lived, temporary credentials directly from AWS at runtime. - No static keys. - No secret rotation headaches. - Less blast radius if something goes wrong. In this blog, We will look at GitHubโ€ฏActions OIDC AWS Integration using a step-by-step example that secures access to the AWS cloud. By the end of this guide, you will understand: - Why OIDC is a secure way to connect GitHub Actions with AWS - How GitHubโ€™s OIDC integration works with AWS - A step-by-step method to set up OIDC using IAM roles - How to test the setup using AWS CLI and deploy to EKS with GitHub Actions workflows ๐——๐—ฒ๐˜๐—ฎ๐—ถ๐—น๐—ฒ๐—ฑ ๐—•๐—น๐—ผ๐—ด: devopscube.com/github-actionโ€ฆ ๐—ก๐—ผ๐˜๐—ฒ: A fully private GitHub Enterprise Server setup cannot use AWS OIDC unless AWS can access the GHES OIDC metadata endpoint over HTTPS #devops #GithubActions #OIDC #aws
30
105
3,397
What is the Difference Between Self-Managed vs Public SSL/TLS Certificates? ๐Ÿš€ DevOps Engineers often work with TLS certificates, and understanding the difference between self-managed and paid certificates is very important. So lets understand the basics. ๐—ช๐—ต๐—ฎ๐˜ ๐—ถ๐˜€ ๐—ฎ ๐—–๐—ฒ๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ฒ ๐—”๐˜‚๐˜๐—ต๐—ผ๐—ฟ๐—ถ๐˜๐˜†? Certificate Authority (CA) is usually a company or organization that issues digital certificates. Here's how to request a TLS certificate from a well-known Certificate Authority (CA) like Verisign, LetsEncrypt or Comodo: - Create a Certificate Signing Request (CSR) with a private key. The CSR includes details about your location, organization, and FQDN (Fully Qualified Domain Name). - Send the CSR to the trusted CA. - The CA validates the request and sends back a TLS certificate signed using the CAโ€™s private key. - Validate and use this TLS certificate with your applications. Most browsers and operating systems ๐˜€๐˜๐—ผ๐—ฟ๐—ฒ ๐—ฟ๐—ผ๐—ผ๐˜ ๐—–๐—” ๐—ฐ๐—ฒ๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ฒ๐˜€ from all the trusted CAs. You can view them from the browser settings. That is why browsers donโ€™t show security messages when visiting websites using TLS from a trusted and well-known commercial CA. Each browser has its own set of criteria and processes for accepting and trusting CAs. ๐—ฆ๐—ผ, ๐˜„๐—ต๐—ผ ๐—ฑ๐—ฒ๐—ฐ๐—ถ๐—ฑ๐—ฒ๐˜€ ๐—ถ๐—ณ ๐—ฎ ๐—–๐—” ๐—ฐ๐—ฎ๐—ป ๐—ฏ๐—ฒ ๐˜๐—ฟ๐˜‚๐˜€๐˜๐—ฒ๐—ฑ? Well, they are vetted by independent audit organizations like webtrust. The results of these audits are important for a CA to be trusted by web browsers and operating systems. Now letโ€™s look at self-managed certificates. For internal applications, organizations often run their own private CA (PKI infrastructure). The workflow looks like this: - Create your own Root CA certificate and CA private key - Generate a server private key and CSR - Use the CA private key to sign the CSR and generate the TLS certificate - Install the Root CA certificate in browsers or operating systems to avoid HTTPS warnings Without installing the Root CA certificate, browsers will show security warnings because the CA is not publicly trusted. For public endpoints, organizations always use certificates from well-known CAs (LetsEncrypt or paid ones) We share deep dives on Kubernetes, DevOps,MLOps, Cloud and GitOps โ†’ ๐—๐—ผ๐—ถ๐—ป ๐—›๐—ฒ๐—ฟ๐—ฒ (๐Ÿญ๐Ÿฌ๐Ÿฌ% ๐—ณ๐—ฟ๐—ฒ๐—ฒ): newsletter.devopscube.com/ โ™ป๏ธ PS: Repost and share it with the DevOps community. Got any tips? โฌ‡๏ธ Discuss in the comments below! โฌ‡๏ธ #devops #devopscommunity
19
69
1,735
โ˜• This Week in DevOpsCube Hereโ€™s the TLDR ๐Ÿ‘‡ โ€ข Feature Store explained for DevOps Engineers โ€ข Deploy a ML Feature Store on Kubernetes โ€ข Reducing Kubeflow image size from 3.17 GB to 354 MB โ€ข Real Kubernetes CNI troubleshooting scenario on AWS โ€ข How Uber runs 60,000 AI agent tasks โ€ข Free Claude courses & learning resources ๐—ฅ๐—ฒ๐—ฎ๐—ฑ ๐—ถ๐˜ ๐—›๐—ฒ๐—ฟ๐—ฒ: newsletter.devopscube.com/p/โ€ฆ โ™ป๏ธ PS: If you find these resources useful, share this with your network. #devops #mlops
5
22
1,065
You won't see this 65% Discount again until Cyber Monday๐Ÿ‘‡ Use code MM26CCCT at kube.promo/cyber to get flat 50% discount on Individual certifications like CKA, CKAD etc. Using code MM26BUNCT you can save up to 60% on the following k8s certification bundles. - CKA CKAD: kube.promo/cka-ckad - CKA CKS Bundle: kube.promo/bundle - CKA CKAD CKS Exam bundle: kube.promo/k8s-bundle - KCNA CKA: kube.promo/kcka-bundle - KCSA CKS Exam Bundle: kube.promo/kcsa-cks - KCNA KCSA Exam Bundle: kube.promo/kcna-kcsa You can save up to 65% off on Kubestronaut Bundle as compare to individual purchase. Use code MM26K8BUNCT for the following Kubestronaut Bundle - KCNA KCSA CKA CKAD CKS: kube.promo/kubestronaut Kubestronaut Benefits Includes the following. - Five 50%-off certification coupons each year to use or share - 20% off select CNCF events - Exclusive Kubestronaut community access - The iconic blue Kubestronaut jacket Grab these offers before it end. โ™ป๏ธ P.S. Donโ€™t forget to repost and share it with the DevOps community and your friends! #kubernetes #devops #cka #ckad #cks
1
3
15
880
DevOps Tool of the Week: Kafbat UI โš’๏ธ Running Kafka in production is great until you need to actually look inside it. Kafbat UI is a free, open-source web UI to monitor and manage Apache Kafka clusters. It gives you a single pane of glass for your Kafka clusters. Brokers, topics, partitions, consumer groups, schema registry, Kafka Connect, all in one dashboard. You can browse messages in JSON, Avro, or Protobuf, filter live streams with CEL expressions, check consumer lag per partition, and create or reconfigure topics without touching a CLI. ๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ ๐—ฅ๐—ฒ๐—ฝ๐—ผ: github.com/kafbat/kafka-ui #devops #devopstool
1
3
16
794