We leveraged two amazing open source projects when building SmithDB.
One is
@ApacheDataFusio: an extensible Rust based query engine. We built custom execution plans specifically tuned for our workloads and storage backend, and DataFusion made it straightforward to plumb everything together.
The other is
@vortexdotdev: an extensible file format that allows you to build custom layouts with specific encoding and chunking strategies for different columns.
I would highly recommend checking out both of these projects if you're interested in modern data systems.
We built SmithDB: the database purpose built for agent observability workloads that now powers many parts of LangSmith.
Agent observability presents a challenging data problem. Agent traces can contain tens of thousands of intermediate spans and large, unbounded payloads. These characteristics are a direct result of agents running for longer time horizons and LLM context window sizes growing.
Traditional data infrastructure was not built to handle the complexities associated with storing and querying this data.
SmithDB brings LangSmith up to 12x performance improvements across access patterns most important for agent observability. I’ve been working on SmithDB directly with an amazing team over the past few months, and I’m incredibly proud of the results we’re seeing.
I wrote a bit more about the story and engineering challenges behind SmithDB in this blog.
Additionally, if you’re a systems engineer interested in building the future of agent observability, please reach out!