Joined December 2008
2 Photos and videos
Slagar retweeted
wondering why I feel exhausted. maybe: the agents do all the easy stuff, and I have to work through the leftover hard bits, which means I'm perpetually locked in. and as the models get better, "my" work just gets harder and harder, until I'm basically underqualified to do the work (which... is better than the alternative, there's nothing left for me to do, and I'm paperclipped).
121
122
2,051
364,957
The answer to “should this be an incident” is always yes. It’s just the scary way of asking, “would you like the support of brilliant people with way more context to have this resolved sooner?”
2
4
39
5,475
Slagar retweeted
PostgreSQL 19 is dropping a massive quality-of-life feature that is going to kill off a lot of messy backend hacks: WAIT FOR LSN. If you run asynchronous read replicas, this fixes your biggest consistency headache. A quick breakdown on how it works: 👇
5
14
112
17,889
Slagar retweeted
A little over a decade ago, I visited Facebook's Prineville datacenter. Facebook had brought together a small group of systems and networking researchers to discuss the future of datacenter infrastructure. What struck me then was how much of the physical design was organized around energy efficiency. The metric everyone cared about was PUE, or Power Usage Effectiveness: total facility power divided by IT equipment power. Older industry averages could be around 2.0 or higher, meaning that a large amount of power was spent on cooling, power distribution, and other facility overhead. Prineville represented a very different design point, with a PUE around 1.07 annualized at full load. The architecture made that plausible. It was fundamentally air-cooled, but carefully engineered: large slow fans to pull in outside air, filtration systems designed to handle air particulate (even from forest fires tens of miles away), evaporative "swamp" coolers on the second level, hot aisle containment, and large ductless supply paths where cool air naturally sank from the upper mechanical level into the data hall. That was the design point: move outside air efficiently, while minimizing the amount of active mechanical work required to cool the data hall. AI infrastructure changes the thermal design basis. For practical purposes, every watt of electrical power delivered to IT equipment must be removed as heat. Facebook later described its overall datacenter design point as about 5.5 kW per rack, with compute-heavy web-server racks around 10 to 12 kW. And if we use a much denser 20 kW CPU-era rack as the comparison point, that rack generates about 68,000 BTU/hour of heat. The airflow math around heat transfer is straightforward. If we allow a 20°F temperature rise across the rack, the 20 kW CPU-era rack needs roughly 3,000 CFM of airflow. For comparison, a typical central AC system in a 2,500 square foot home might move roughly 1,500 to 2,000 CFM of air. Even a 20 kW rack is effectively moving the airflow of one or two homes through a single cabinet. A current rack of NVIDIA Blackwell GPUs, such as the GB300 NVL72, is roughly 140 kW. That is about 478,000 BTU/hour from one rack. If air had to remove the full heat load, it would require more than 22,000 CFM, akin to pushing the airflow of a dozen homes through a single rack. And that is today. NVIDIA's future Rubin Ultra Kyber rack has been reported around 600 kW. If air had to remove that full heat load, the required airflow would be close to 95,000 CFM, roughly the airflow of 50 homes' central AC systems. More importantly, the challenge is not simply moving enough air through the rack. Modern AI accelerators concentrate enormous amounts of heat into a small physical area, so air alone becomes increasingly impractical as the primary path for removing heat from the chip package. In other words, the engineering of heat transfer at this scale fundamentally changes. Once the rack moves from tens of kilowatts to 100 kilowatts, the cooling system changes from "move enough air through the room" to "capture heat at the components and transport it through a liquid cooling loop." It also shortens the operational time window. A loss of coolant flow is no longer simply a maintenance issue. It can immediately affect compute availability. Operators now need visibility into coolant flow, pressure, inlet and outlet temperatures, pump state, cooling-system health, leak detection, rack thermal behavior, workload placement, and history. There is a joke that AI factories convert energy into tokens. But like advanced manufacturing facilities, they depend on complex physical infrastructure, continuous monitoring, and operational control systems. And increasingly, the data systems needed to understand and optimize them. A decade ago, most of us thought of the datacenter as the substrate underneath the distributed system. Increasingly, the datacenter itself is becoming part of the distributed system. More to write.
2
6
23
2,865
Slagar retweeted
are you building agents on @CloudflareDev or using the agents sdk? are you following best practices for building on durable objects/containers? if you want someone to take a look at your code, let's chat this week, dm's open 🤙
24
12
162
19,058
Slagar retweeted
The answer is that MacOS blocks this program because it looks like ChillyHell, a virus that was deployed against officials in Ukraine. Any program that does TLS and contains "That's strange", "wonder", and "Welcome to Paradise" is blocked from running
this rust program cannot be run on normal MacOS, but it'll run just fine if you delete either of the two statements. anyone want to guess why?
15
82
2,010
126,602
Slagar retweeted
Been working with John on this article the last four months to show you Postgres's new graph query support in SQL/PGQ. He doesn't stop there though; exploring the same features in LadybugDB (recent fork of Kuzu acquired by Apple) and DuckDB. Paywall has expired, give it a read!
9
11
88
4,614
Slagar retweeted
"FokosDB: A strongly consistent bottomless storage database ontop of Cloudflare Durable Objects" - lambrospetrou.com/articles/f… Still very early stage, and lots of things to optimize and implement, but it was time to write an article describing the high level architecture! 🚀
FokosDB progress (DynamoDB on Durable Objects): ✅Distributed transactions (transactWriteItems, transactGetItems). ✅Core operations (putItem, getItem, deleteItem). ✅Hash partition splitting based on storage. Bottomless storage achieved. Not bad performance either.
5
8
78
21,931
Slagar retweeted
Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. radar.cloudflare.com/traffic…
388
2,169
8,318
2,242,162
Slagar retweeted
If we locked all the database CEOs in a room. Just a text editor. No LLM access. Which one could build a basic working database first?
70
6
603
197,648
Slagar retweeted
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
308
979
8,936
791,284
Slagar retweeted
"Is that code AI generated? If it’s AI generated I don’t want it"
83
709
8,178
551,066
Slagar retweeted
60
859
14,742
256,175
Slagar retweeted
It's kind of crazy how much of the way we've been designing Workers over the past 9 years unexpectedly turns out to be so relevant to AI and agents. Durable Objects and lightweight isolate sandboxes are obvious big things. But there are subtler things. Consider "bindings". In Workers, our environment (`env` object) doesn't just contain strings. It can contain live objects, which we often call "bindings". For instance, a Workers KV binding is a live object representing a Workers KV storage namespace. Once you've configured it, you can just do: let val = await env.MYKV.get("foo") await env.MYKV.put("foo", "new value"); Notice: There's no connection string. No secret token that you have to pass to talk to your KV namespace. The Workers Runtime handles it for you. You just get an already-initialized client object, on which you can call methods. You can still do everything you want to do. But you know what you can't do? Leak the secret token. Because there isn't one. A KV namespace binding fundamentally cannot be "leaked" because it's not bytes. But over the years, a lot of people have questioned whether this really mattered. I've had people inside and outside the team say: "Why are you so weird, Kenton? Yeah sure it can't leak but now I have to learn this new way of thinking about things. No other runtime works this way so writing portable code takes extra work. I'd rather just stick to what I'm used to, and anyway I know better than to leak my environment variables." Well, now we have AI agents writing the code and... suddenly everyone is worried about agents leaking keys. People are creating convoluted schemes to intercept the outbound traffic and inject keys in a proxy, or trying to issue very-short-lived keys so that if the agent leaks them the window of attack is short. Ahem. Welcome, folks! We solved this 8 years ago! Here's an old blog post -- written when I personally was still very much Not Thinking About AI -- which seems so much more relevant now: blog.cloudflare.com/workers-…
28
49
531
87,760
Slagar retweeted
- XZ utils backdoor: found by guy debugging 200ms latency - LiteLLM hack: found by guy debugging oom issue These could have been the most impactful compromises ever. Forget security vendors, weaponize your engineers’ autism.
56
467
4,208
149,495
Slagar retweeted
Cloudflare’s Gen 13 servers double our compute throughput by rethinking the balance between cache and cores. Moving to high-core-count AMD EPYC ™ Turin CPUs, we traded large L3 cache for raw compute density. By running our new Rust-based FL2 stack, we completely mitigated the latency penalty to unlock twice the performance. cfl.re/4uKJKp9
10
24
234
26,076
Slagar retweeted
Mar 10
Here’s what’s gonna happen: - you replace your code review with feedback loops (sentry, datadog, support tickets, etc) - you stop reading the code - software factory fixes everything - one day something breaks at 3am, agent can’t fix it - nobody’s read the code in 3 months - you have 3 weeks of downtime trying to re-onboard and fix it - you lose significant % of your contracts and users - your company is now dead
Mar 7
Replying to @gregpr07
this may surprise you that thus is coming from me but I think we’re in for a 1-3 year period where stuff might break at 3am and if you’re relying on loops to fix it and nobody understands what’s under the hood, you’re looking at an existential threat to your company
255
556
6,842
628,406
Slagar retweeted
there is a rhetoric in ai rn that vibing and half-assing is the future of technology. do not fall for this psyop. the future is deep understanding and mastery. always has been
98
886
7,744
139,944
Slagar retweeted
I think this is the right take and we’re going to see a ton of OSS projects adopt very similar policies.
Ghostty is getting an updated AI policy. AI assisted PRs are now only allowed for accepted issues. Drive-by AI PRs will be closed without question. Bad AI drivers will be banned from all future contributions. If you're going to use AI, you better be good. github.com/ghostty-org/ghost…
1
21
2,678
Slagar retweeted
You can now tell us exactly where your existing cloud infra is and we'll place your compute as close as possible. single-digit latency to your DB and legacy cloud infra. no guessing.
29
36
621
58,377