Been building Traks(Web Analytics Platform) on
@Cloudflare pipelines r2 sql and honestly.
This combo is stupid good.
ingest firehoses into r2 as parquet, query it straight with sql. no warehouse, no etl glue, no ops.
The Serverless data stack is finally here.
Using it for my personal projects as i'm still digging into the exact r2 sql limits. concurrency, scan size, query duration and other limitations.
Cloudflare continues to build up its zero-egress-charges data warehouse capabilities with R2 Pipelines
Ingest data into an Data Catalog-enabled R2 bucket, using Pipelines or other mechanisms, then run queries over it using R2 SQL or your favourite engines like Snowflake. Pipelines can ingest, transform, and load streaming data into Apache Iceberg or Parquet in R2.
Both got new features on Monday:
You can now ingest logs directly into Pipelines from your Cloudflare Workers by using Pipelines as a Logpush destination. Logs can be noisy, so using Pipelines, you can trim down to the fields you need before the data is stored in R2 in Parquet files or Apache Iceberg tables.
R2 SQL now supports functions for easily querying JSON data, alongside improving readability of EXPLAIN queries by allowing them to be formatted as JSON.
Lastly, unpartitioned Iceberg tables can now be queried directly - this should only be used for smaller tables, as partitions will significantly help performance at larger scale.