Object storage as the default persistence layer for modern data infrastructure didn't happen by accident. Three things converged, and they're worth pulling apart.
1️⃣ Cloud-native economics finally caught up with storage.
Kubernetes made compute elastic. EBS didn't. If what you want is pay-per-use, multi-tenant economics that scale to zero, you eventually end up on S3. Not because it's fast — it isn't. Because it's the only storage tier whose pricing model matches how modern workloads actually behave: spiky, unpredictable, and idle a lot of the time.
2️⃣ S3 itself grew up.
Strong read-after-write consistency landed in 2020. Express One Zone with single-digit-ms latency in 2023. Then Mountpoint, intelligent tiering, conditional writes. The S3 you can build on today is genuinely not the S3 you uploaded vacation photos to in 2008 📸.
Real databases now sit on top of it: Snowflake, Databricks, ClickHouse Cloud, WarpStream, Turbopuffer, the Iceberg/Delta/Hudi crowd. Even Postgres — Neon put its entire storage engine on object storage, which would have sounded like a bad joke for an OLTP system a few years ago.
3️⃣ The workload shape changed everywhere, not just in one corner.
Most of the new high-volume data is semi-structured or unstructured: vectors, JSON, events, logs, traces. It shows up in observability, in business analytics, in customer 360, in AI training data. This isn't a domain-specific shift.
Dashboards and nightly ETL are giving way to ad-hoc analytics, RCA, and increasingly agents 🤖 firing open-ended questions at operational and business data. That kind of workload is append-heavy, hungry for throughput, and you basically cannot capacity-plan it. Object storage handles that better than any block device, because you're not paying for a peak you never hit.
One thing worth being honest about ⚠️: none of this is free. OLTP still wants local NVMe for hot paths. Small-object API costs can quietly exceed your storage bill. Listing is still slow. Cross-cloud egress will eat your margins if you're not careful. Anyone pitching "just put it on S3" without talking about caching, compaction, and catalogs is pitching a slide deck.
The direction is pretty clearly set, though. In 2026, building a new data system on anything other than object storage as the source of truth feels like something you should have to justify, not something you fall back on.
It's why we built
@Greptime this way from day one. Time-series and observability happened to be one of the first places where the math stopped being arguable. I don't think it stops there.
Curious how others are seeing this — especially folks on the OLTP side who'd push back. 👇
—
References:
Snowflake's elastic warehouse (SIGMOD 2016):
dl.acm.org/doi/10.1145/28829…
Lakehouse architecture (CIDR 2021):
cidrdb.org/cidr2021/papers/c…
S3 strong consistency:
aws.amazon.com/blogs/aws/ama…
S3 Express One Zone:
aws.amazon.com/s3/storage-cl…
Mountpoint for S3:
aws.amazon.com/blogs/aws/mou…
Neon architecture:
neon.com/docs/introduction/a…
WarpStream:
warpstream.com/blog/kafka-is…
GreptimeDB architecture:
docs.greptime.com/user-guide…