Joined May 2009
142 Photos and videos
David Espejo retweeted
The Flyte 2.0 SDK is officially here: This release brings some really exciting local development features that you can right out of the box even with connecting to Kubernetes or Docker! The features I've been loving to my local AI workflows: - Terminal User Interface (live workflows and history) - HTML Reports (tracks to workflow version - Caching (skip rerunning a task by reading from cache) - Retries (auto retry a task if it fails - super useful for API calls) - And more All this together gives you: - great lightweight experiment tracking - a lift in reliability - a great interface for debugging large pipelines or AI agent runs I can't wait to show you whats next in both Flyte and @union_ai
2
3
4
663
It's that time of the year again when I pack my bags and tackle the 21hr journey to meet my awesome colleagues at @union_ai's HQ. As usual, I try to use the time to take on my reading backlog while people around are watching Netflix 😅
21
🚀📕 The GPU Kubernetes book is finally here. After six months of rabbit holes, I finally understood why this problem was so hard. When I started, I thought GPUs were just fancy parallel processors. Mount the device, set some resource limits, and done. Then I learned that GPUs can't even pause a running kernel. Once computation starts, it runs to completion - no preemption, no time-slicing in the CPU sense, nothing. The hardware was designed this way for maximum throughput, and no amount of software can change it. This fundamental difference breaks every assumption Kubernetes makes about resources. The Linux kernel sees and controls every CPU cycle and memory page. But GPU operations? They happen in a black box managed by the NVIDIA driver. The kernel is completely blind. So I wrote this book. Six chapters that trace the problem from hardware to orchestration: 1️⃣ Why containers work beautifully for CPUs (syscalls, cgroups, namespaces) and why GPUs break every one of these assumptions. You'll understand exactly how device plugins trick Kubernetes into accepting GPUs it can't actually manage. 2️⃣ How traditional Kubernetes isolation completely fails for GPUs. When two pods share a GPU, there's no cgroup enforcement, no memory isolation, nothing. One pod can crash everything. 3️⃣ The truth about "GPU sharing" tools. KAI-Scheduler and NVIDIA's "time-slicing" don't share anything - they just orchestrate turn-taking. Your pods still wait in line for exclusive GPU access. 4️⃣ MIG vs HAMi vs vGPU. When you actually need hardware partitioning (spoiler: probably never), and why seven T4s might serve you better than one H100 with MIG. 5️⃣ Why nvidia-smi lies to you, Kubernetes metrics lie differently, and DCGM reveals that 60-70% of your GPU budget is wasted on idle resources. 6️⃣ How to share GPU clusters across teams without namespace chaos. Virtual clusters give each team its own control plane while efficiently sharing the underlying hardware. Download the free book here: ku.bz/gpu-k8s Huge thanks to Saiyam who co-authored this, bringing real production experience I lacked. To the vCluster team (Rahul Patwardhan!!!) who believed in this project and sponsored my research time. And to Gulcan who edited countless drafts into something readable. 💡 If you want to go deeper, join me for a live discussion this Wednesday, where I will answer your GPU questions and explain how the book came to be ku.bz/g8gXCKW12
6
68
391
20,238
David Espejo retweeted
21 Aug 2025
Amazon S3 Vectors Flyte 2.0 = simpler, cheaper AI pipelines Flyte 2.0 users can now use S3 Vectors. That means no separate vector database, cut up to 90% of related costs, fewer moving parts to manage. Learn how it works: union.ai/blog-post/amazon-s3… #S3 #orchestration #agents
2
5
138
David Espejo retweeted
5 Aug 2025
Flyte 2 isn't just an AI orchestrator. It's a durable, dynamic agent runtime. union.ai/flyte/2-0-announcem…
1
3
136
David Espejo retweeted
30 Jun 2025
Pandera passed 100 million downloads this weekend! (not the sandwich shop, the open source data validation project) Huge thanks to our incredible OSS community at Union.ai. Here’s to 100 million more! union.ai/pandera
1
2
158
David Espejo retweeted
A free 698-page PDF ebook with everything you need to know about math:
49
394
2,832
208,523
David Espejo retweeted
20 Mar 2025
I am at GTC and found like 10 Companies built on top of Flyte - crazy way to spread fast.
1
3
4
188
David Espejo retweeted
18 Mar 2025
🚨We’re at GTC (booth 2022) with a huge announcement—Union now serves AI models and apps! Learn common AI serving mistakes in our whitepaper: hubs.la/Q03ckdcp0 Union serves 2x faster than SageMaker, in any cloud. #GTC2025 #MLOps #Serving #Inference #CompoundAI

2
5
158
David Espejo retweeted
13 Mar 2025
🚀 Exciting news! Union is heading to #gtc2025 next week! 🎉 Join us at Booth 2022 to see how we’re revolutionizing AI workflows and model serving. Want to see it in action? Want to learn how Union can Unify your AI Development? 🔗 Book a chat: cal.com/team/unionsolutions/…
2
2
131
David Espejo retweeted
9 Jan 2025
How is AI orchestration powering the future of #bioinformatics? Union’s platform empowers researchers to tackle computationally intensive problems like protein folding with ease, scalability, and precision. Learn more in our latest blog. #ML hubs.la/Q031WPpR0
1
2
111
David Espejo retweeted
Just tried @genmoai's Mochi model for video generation on @union_ai serverless with an A100 GPU 🎥✨ The video quality is impressive! Check it out: dub.sh/jhR7T6D
1
1
11
376
The DAG models everything from ETL pipelines to neural networks so well that it has become one of the few widely adopted abstractions in the ML world.
56
David Espejo retweeted
Retries. Often touted as a simply fix for reliability in the presence of failure-However, retries are incredibly difficult to get right. This paper is a fantastic discussion of the complex world of retries
3
37
297
23,283
David Espejo retweeted
13 Sep 2024
Replying to @yujian_tang
Also in Seattle next week: @JohnGilhuly will be at @union ai for an in-person event dedicated to agents on Wednesday. Join us for two talks plus lots of time for networking. Here's the schedule: ✨ 5:30pm - 6:00pm: Networking, and Food ✨ 6:00pm - 6:30pm: Evaluating AI Agents and Assistants (that's us) ✨ 6:30pm - 7:00pm: Building Agentic RAG (Union AI) ✨ 7:00pm - 8:00pm: More social time 100% free and 100% useful! Open to all skill levels. Register here: eventbrite.com/e/ai-talks-at…

1
3
4
465
David Espejo retweeted
1/5 In the lead-up to #Raysummit, we have a guest blog on Building a RAG Batch Inference Pipeline with @anyscalecompute and @union_ai 🚀
1
7
7
945
David Espejo retweeted
This is the challenge that all of us in #DevRel face every day!
Your daily reminder:
6
11
81
15,090
Am I going too far?😅 Anyways, all I expect from this book is to understand better the arch of a GPU, not to become proficient in CUDA.
1
1
69
Getting started with Notebooks is easy but state management is a bit hectic and doesn't fill the reqs for a production system. Using @flyteorg imposes ergonomic changes like type annotations and the overall mindset of a "pipeline", but it's reliable and reproducible OOB.
2
63