Filter
Exclude
Time range
-
Near
In this issue, we covered @ahmetb 's article on Kubernetes List API performance and reliability. For anyone who has run a large production Kubernetes cluster before, the mysterious apiserver OOM is the most annoying problem. This article explains one of the most common culprits: List calls. In my experience, the challenge is that you could have many controllers that connect to the API server in a cluster: - Your CNI plugins that run as a DaemonSet - Components such as Istio that watch service, node, and pod resources - Tenant’s apps connecting to the API server for their own purposes When incidents happen, they cause a gigantic chicken-and-egg problem (because controllers could retry). It’s really difficult to understand what caused the OOM. Much of the issue discussed in this article is due to the simple KV data model provided by etcd: simple to write, hard to query.
15
Day 114/365 Built a centralized logging solution on Kubernetes using Grafana, Loki & Promtail. Loki log aggregation Promtail DaemonSet Grafana integration Nginx application logging LogQL queries in Grafana Explore Logs now flow: Pods→Promtail→Loki→Grafana
1
2
42
Rui Ribeiro retweeted
ayaFlow is an eBPF-based Rust tool that runs as a sidecarless DaemonSet to capture node-wide network traffic, expose metrics, and provide lightweight kernel-level visibility for troubleshooting and observability ➤ ku.bz/m08ygstP6
1
15
735
Kubernetes gives you several ways to run a Pod, and on the surface Deployments, StatefulSets, and DaemonSets all seem to do the same thing. These all run containers, but they exist for different reasons, and which type to choose isn't always clear. The article below discusses all three. Deployments handle stateless apps like web servers and APIs where any Pod can serve any request. StatefulSets give databases stable identities and dedicated storage. DaemonSets run one Pod per node for agents. The framing of whether your app needs to remember who it is, whether it must run on every node, or whether it's just a typical service can help to understand the differences. Check this out from Halil Furkan. lckhd.eu/Lagj1J #Kubernetes #StatefulSet #DaemonSet #Deployment
1
10
47
2,747
Kubernetes NodeLocal DNSCache Explained 🚀 When it comes to performance, Every DNS lookup in Kubernetes matters. Without NodeLocal DNSCache, Pods send DNS queries to the kube-dns/CoreDNS Service IP. These requests go through kube-proxy, DNAT rules, and conntrack before reaching CoreDNS. In busy clusters, this can add latency and increase pressure on the conntrack table. NodeLocal DNSCache solves this by running a local DNS cache on every node as a DaemonSet. So, instead of talking to CoreDNS directly, Pods send DNS queries to the local cache on the same node. Here are its key benefits, - It Reduces average DNS lookup time as the DNS queries are resolved locally using DNS cache - It reduces load on CoreDNS - It prevents conntrack table exhaustion as Connections from Pods to their local cache don't create conntrack table entries - DNS queries for external URLs fcan be forwarded directly without involving CoreDNS We share deep dives on Kubernetes, DevOps, MLOps, Cloud and GitOps → 𝗝𝗼𝗶𝗻 𝗛𝗲𝗿𝗲 (𝟭𝟬𝟬% 𝗳𝗿𝗲𝗲): newsletter.devopscube.com/su… ♻️ PS: Repost and share it with the DevOps community. Note: NodeLocal DNSCache it is not enabled by default in Kubernetes. It must be deployed manually as a DaemonSet (except on managed platforms like GKE Autopilot where it's now default). Got any tips? ⬇️ Share and discuss in the comments below! ⬇️ #devops #kubernetes
7
44
973
Day 114/365 Built the foundation of a Kubernetes Observability Stack using Kustomize. Monitoring Namespace Prometheus Deployment Grafana Deployment Kustomize Base & Overlays Node Exporter DaemonSet Setup Next: Loki, Promtail, OTEL & Tempo #DevOps #Kubernetes
2
27
DevOps Tool of the Week: Dragonfly 🛠️ Pulling a 130 GB model to 200 GPU nodes generates 26 TB of traffic. But what if nodes pulled from each other instead of the source? That is where Dragonfly helps. It is an open-source peer-to-peer file distribution system. The initial downloading node becomes a local cache. The model hub is hit once, and the rest of the cluster pulls from each other. Here is what it does 👇 - Supports direct model downloads from Hugging Face and ModelScope. - Split files and start sharing peer-to-peer before the first download finishes. - Reduce origin bandwidth from 26 TB to ~130 GB across 200 nodes. - Supports private repos with token authentication. - Deploys on Kubernetes as a DaemonSet. 👉𝗚𝗶𝘁𝗵𝘂𝗯 𝗥𝗲𝗽𝗼: github.com/dragonflyoss/drag… #devopstools #devops
4
24
1,041
This blog post tells how the Render team: - tracked down Kubernetes memory waste caused by many daemonset namespace watches, - fixed config issues, - and freed over 7 TiB of memory across clusters by reducing unnecessary listwatch overhead ➜ ku.bz/2vS0QsvjY
5
13
2,257
[LIVE Session] Advanced Karpenter Optimization Techniques Karpenter looks simple from the outside, but operating it well takes real understanding. From overprovisioning to pending Pods to sudden rescheduling, it is rarely straightforward. Join us for a live session as we explore how to use Karpenter beyond basic autoscaling to reduce Kubernetes costs and optimize cluster utilization. You will learn: - Key resource efficiency techniques with Karpenter. - Practical hacks for disruption budgets, DaemonSet overhead, Graviton adoption, and more. - How to cut hidden infrastructure costs without impacting performance. - Multi-layer automation with Karpenter. Only 50 seats left. Join For Free Here: zesty.co/lp/cost-optimizatio… 🔁 Consider a Repost if this is helpful
1
4
5
380
Pumba lets you kill, pause, and stress containers while injecting network delays, packet loss, and corruption You can deploy it as a DaemonSet for cluster-wide chaos engineering ➜ ku.bz/K7_RB9tSq
6
24
2,096
This tutorial shows how to run Cloudflare Tunnels as a DaemonSet to expose services with zero open inbound ports, using liveness probes, Kubernetes Secrets, and GitOps with ArgoCD ➜ ku.bz/RYlKnctWf
2
8
734
This tutorial shows how to migrate Amazon EKS VPC CNI from a self-managed DaemonSet to an AWS managed add-on by preserving custom env settings, moving permissions to IRSA, and avoiding downtime during adoption ➜ ku.bz/HLl9fhxc7
2
505
Running GPU workloads on K8s? Scaling on standard CPU/memory metrics leaves your actual accelerators invisible. See how to build a KEDA external scaler via a DaemonSet to query NVML over gRPC directly, bypassing the Prometheus pipeline for sub-second scaling. 🛠️ bit.ly/4nTPVEv
1
2
25
2,151
Just finished a Kubernetes lab where I migrated an app from old ReplicationController single Pods to Deployment DaemonSet. Key takeaways: 1.Deployment = use for web apps that need scaling safe updates 2.DaemonSet = perfect when each node needs a copy of a service
2
2
913
☸️ Kubernetes in Plain English 📦 Pod → Runs your application containers 🚀 Deployment → Keeps apps running → Handles rolling updates safely 💾 StatefulSet → For databases & apps needing persistent storage 🛰️ DaemonSet → Runs one Pod on every node → Perfect for logging & monitoring agents 🌐 Service → Gives Pods a stable network endpoint 🔀 Ingress → Manages external HTTP/HTTPS access 📝 ConfigMap → Stores non-sensitive configuration 🔐 Secret → Stores passwords, tokens & certificates 📂 Namespace → Separates resources between teams/environments 🖥️ Node → The machine where Pods run 🧠 Control Plane → The brain of the cluster 🛡️ RBAC → Controls who can access what Master these basics once… and Kubernetes becomes MUCH easier ⚡
2
19
100
3,515
This article explains how to monitor GPU inference nodes on Kubernetes with eBPF and bpftrace by tracing NVIDIA driver calls, kernel behavior, and DaemonSet-based deployment patterns ➤ ku.bz/dH51_VM47
2
14
797
Setting ImagePullPolicy: Always, on a 12GB model image pulls it on every pod restart. That's egress charges on every node, every time. Fix: Use IfNotPresent Pin by digest, not tag Cache with a registry mirror or DaemonSet puller One config change. Real money saved.
1
15
454