An extensible, state of the art columnar file format. Formerly at @spiraldb, now a Linux Foundation project (@LFAIDataFdn). Apache-2.0

Joined May 2025
1 Photos and videos
vortex retweeted
so cool to see another blazing fast database built on vortex!
Just announced at Interrupt! SmithDB. Agent traces have outgrown the databases built to hold them. That’s why we built SmithDB, a purpose-built distributed database for agent observability. Read the announcement from Co-Founder @ankush_gola11langchain.com/blog/introduci…
6
33
70,965
vortex retweeted
We leveraged two amazing open source projects when building SmithDB. One is @ApacheDataFusio: an extensible Rust based query engine. We built custom execution plans specifically tuned for our workloads and storage backend, and DataFusion made it straightforward to plumb everything together. The other is @vortexdotdev: an extensible file format that allows you to build custom layouts with specific encoding and chunking strategies for different columns. I would highly recommend checking out both of these projects if you're interested in modern data systems.
We built SmithDB: the database purpose built for agent observability workloads that now powers many parts of LangSmith. Agent observability presents a challenging data problem. Agent traces can contain tens of thousands of intermediate spans and large, unbounded payloads. These characteristics are a direct result of agents running for longer time horizons and LLM context window sizes growing. Traditional data infrastructure was not built to handle the complexities associated with storing and querying this data. SmithDB brings LangSmith up to 12x performance improvements across access patterns most important for agent observability. I’ve been working on SmithDB directly with an amazing team over the past few months, and I’m incredibly proud of the results we’re seeing. I wrote a bit more about the story and engineering challenges behind SmithDB in this blog. Additionally, if you’re a systems engineer interested in building the future of agent observability, please reach out!
2
18
104
18,029
vortex retweeted
The Research Behind Modern Data Compression & @vortexdotdev When we chose Vortex as the storage layer for Spice Cayenne (the data accelerator engine in Spice), we were betting on decades of database research finally reaching production-ready maturity. Here's the research behind Vortex: 📄 BtrBlocks (SIGMOD 2023) - The core algorithm from the Technical University of Munich. Cascading multiple lightweight encodings outperforms monolithic compression. Optimize for decompression speed, not just compression ratio. 📄 FastLanes (VLDB 2023) - Hardware-friendly integer compression. Structures bit-packing to maximize SIMD utilization across AVX-512, AVX2, and ARM NEON. Near-memory-bandwidth decompression. 📄 FSST (VLDB 2020) - Fast Static Symbol Table for strings. Near-LZ4 ratios at 5-10× faster decompression. Critical for string-heavy columns. 📄 ALP (CWI Amsterdam) - Adaptive Lossless floating-Point compression. Exploits real-world float patterns (prices with 2 decimals, sensor readings with limited precision). 📄 MonetDB/X100 Morsel-Driven Parallelism - Foundations for vectorized, NUMA-aware query execution that Vortex builds on. The result? Compression that is tailored to your data: • Integers via FastLanes bit-packing • Floats via ALP adaptive encoding • Strings via FSST symbol tables • Timestamps via delta encoding • Sorted columns via run-length encoding Why does this matter for production systems? 1️⃣ Query performance scales with decompression speed. Focus on decode performance translates directly to faster queries. 2️⃣ Automatic encoding selection means zero configuration. The algorithm samples your data and picks optimal strategies per column. 3️⃣ SIMD acceleration is baked in. FastLanes was designed for vectorized, hardware accelerated execution from day one. 4️⃣ Zero-copy Arrow access. Data decompresses directly to Arrow arrays with no intermediate copies. Vortex is now a Linux Foundation AI & Data project, and researchers are building on it (Anyblox, F3). You get SOTA research in production systems. The future of data storage is exciting. To learn more about our Vortex implementation, check out the blog: hubs.ly/Q04bGfvf0 #datafusion #ai #data #vortex #spiceai #arrow #parquet
1
6
378
vortex retweeted
Connor Tsui & I just merged a first cut of TurboQuant into @vortexdotdev , already validated on production embeddings 🚀🚀🚀
1
5
12
2,265
Fastest OSS file format, in both performance and velocity
Connor Tsui & I just merged a first cut of TurboQuant into @vortexdotdev , already validated on production embeddings 🚀🚀🚀
1
249
you took up with Weasley, but he can't afford sliceable cascaded encodings. now your random access is dogged, and your cortisol is properly spiked, potter
3
91
hey man, thrilled that you're interested in contributing. we'll be waiting for you in slack vortex.dev/slack

I need a GitHub too! Is it like that or nah?
1
162
vortex retweeted
CASE-WHEN support coming to @vortexdotdev Guess I'm a Vortex contributor now!
1
8
403
🦆❤️🚀
Jan 23
DuckDB now supports reading from and writing to the Vortex file format! The DuckDB Labs and Spiral teams have worked together to make Vortex available as a core extension in DuckDB. Vortex is an open source, columnar file format whose design is heavily influenced by recent research in lightweight compression encodings, computing and IO techniques. We gave it a test drive, and it performed very well. Read the full article to learn more lnkd.in/eZfGzPiZ
2
6
549
vortex retweeted
🌪️ Why LF Vortex for hot data? @ApacheParquet great compression, slow decode @ApacheArrow instant decode, no compression Vortex: encoding-efficient compression with SIMD decode to Arrow 80% of Parquet's compression, 10x faster decode
1
5
11
820
vortex retweeted
Happy to share that I've been nominated to the @vortexdotdev Technical Steering Committee! It's been fun and productive switching to Vortex from Parquet as our storage format at Polar Signals and I'm excited to continue contributing to the Vortex project.
1
1
4
353
vortex retweeted
Super cool, they forked @DeltaLakeOSS to replace Parquet (for data) with Vortex and JSON (for metadata) with Vortex. Huge performance gains! Maybe we should upstream this one 😁 @vortexdotdev
🧊 New on the Polar Signals Blog — Our Delta Lake Fork Purpose-built for our continuous profiling product. In our latest post, we walk through how Delta Lake works, and the changes we've made to improve performance for our product. 👉 Read the full post: buff.ly/KwHINtO
4
5
65
7,782
vortex retweeted
So cool!! Polar Signals reduced query runtimes by 70% switching from Parquet to Vortex 🤯🚀
We completed a major project to switch our storage file format from Parquet to Vortex 🌪️ resulting in 70% average query performance improvement across the board 🚀 Learn more about how rethinking interface-imposed limitations unlocked these gains in our latest blog post 👇
3
24
2,414
vortex retweeted
We completed a major project to switch our storage file format from Parquet to Vortex 🌪️ resulting in 70% average query performance improvement across the board 🚀 Learn more about how rethinking interface-imposed limitations unlocked these gains in our latest blog post 👇
2
7
27
3,712
vortex retweeted
The talk on @SpiralDB at @CMUDByoutube.com/watch?v=zyn_T5ur… is a great one. I think it would also be interesting to hear a counterpoint about @ApacheParquet that explains actual technical details of that format, the Cathedral vs Bizzaar management, options with Metadata, etc
2
15
111
8,786
vortex retweeted
Today's Future Data Systems Seminar Speaker: Will Manning (@willmanning) will present @SpiralDB's Vortex file format (@vortexdotdev). Vortex is now a @LFAIDataFdn project. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futured…
8
40
6,163
8 Sep 2025
Go check out our latest post, sharing new developments from the past month 🗓️💻☕️ vortex.dev/blog/september
2
11
1,289