My colleague Marwan just shipped an OSS project blog post - it's about scaling OCR with ZenML and it's super practical!
Anyone who's tried to go from "hey I got GPT-Vision to read a PDF!" to actually processing thousands of docs knows the pain. Marwan built this thing called OmniReader that handles the boring-but-crucial stuff like batching, retries, and caching that you definitely need but nobody talks about.
The most useful part IMO is how he integrated different models (OpenAI, local Ollama stuff including Gemma) through a unified interface, so you can swap models without rewriting everything. There's also an evaluation pipeline that calculates CER/WER metrics to objectively compare which model actually performs better for your specific documents.
I've been fighting with some similar batch processing headaches, so seeing actual code for the parallel processing and batching implementations was helpful. He included a Streamlit demo too if you want to test it yourself.
Blog:
zenml.io/blog/ocr-batch-work…
PS: If you're dealing with invoices, contracts or other docs at scale, might save you some pain. No marketing BS, just useful patterns.
#OCR #PythonEngineering #MLOps #GenAI