someone just open-sourced a 1.7B parameter model that parses text, tables, formulas, images, and PDFs across 100 languages. 🤯
1.7B parameters. that's tiny. GPT-4 class models sit north of 100B.
the assumption has always been that multimodal parsing at real breadth requires massive scale. this breaks that cleanly.
the capability isn't the story. the efficiency is. a model this small runs locally, runs cheap, runs on a decent laptop. no API call. no GPU cluster.
document parsing is unglamorous work. it's also the bottleneck in almost every serious data pipeline.
open-source at 1.7B means it goes everywhere.