I spent the last few weeks building VectorFlow-RAG, a full semantic search QA system, because I wanted to understand what it actually takes to ship an ML system end-to-end.
> Why I Built It
Keyword search breaks when you need semantic understanding. Pure vector search breaks when exact wording matters.
So the only practical answer is a hybrid: BM25 for hard lexical matches, embeddings for meaning. I added a simple alpha parameter to balance the two, which turned out to work better than committing to either side.
> What the System Looks Like
Retrieval: BM25 Sentence-Transformer embeddings, combined with tunable weighting.
Storage: ChromaDB as the vector store. Lightweight, local, and easy to swap out for FAISS/Milvus.
Inference: Everything runs locally using Ollama TinyLlama to avoid API dependency and latency issues.
Pipeline: A single orchestrator handles chunking, embedding, indexing, hybrid scoring, and pushing context into the LLM.
> Where the Real Work Happened
The ML part was maybe 30% of the effort. The rest was MLOps
• Over 100 tests (unit integration). They caught real bugs: empty docs, metadata issues, OS-specific path bugs, etc.
• Benchmarks on mock MS MARCO data to see how BM25, vector search, and hybrid retrieval actually trade off.
• GitHub Actions handles linting, formatting, type checks, and the entire test suite across multiple Python versions.
> Making It Usable
I added a Streamlit interface so you can upload documents, ask questions, and see retrieved chunks and latency. Not fancy, but it turns the project into something people can try without digging into code.
> What I Learned
This wasn't about inventing a new algorithm. It was about putting know pieces together cleanly: modular components, consistent interfaces, good tests, reproducible benchmarks, and a simple UI. That's most of ML engineering in practice.
I started out wanting to understand semantic search. By the end, I had something that actually feels production-ready, and the gap between those things taught me more than the search algorithm itself.
I'm open sourcing this because I found value in understanding how everything connects. If you want to build or learn from it, grab it on GitHub.
The code is there, the tests are there, the benchmarks are there. Use it as a reference for building systems that don't just work in notebooks.