If you need structured Markdown/JSON from document images and PDFs, here is a useful tool – Dolphin
It works with multi-page PDFs, vLLM, TensorRT-LLM, HF.
- Dolphin detects whether a doc is scanned or digital
- Recovers layout reading order
- Parses text, tables, formulas, code with different strategies
- Runs element parsing in parallel when possible
Moreover, Dolphin scales from a lightweight 0.3B model to a 3B version and shows a big jump in quality (up to 89.8 on OmniDocBench)