OpenDataLab

OpenDataLab

37 Photos and videos

Tweets

OpenDataLab @OpenDataLab_AI

May 12

🎉 Introducing NanaDraw – your AI-powered scientific diagram maker! Generate editable, publication-ready flowcharts, architecture diagrams, and illustrations in 5 minutes with simple text prompts. ✅ 4 smart modes: Auto/Draft/Generate/Assemble shannon.opendatalab.com/nana…

OpenDataLab

OpenDataLab @OpenDataLab_AI

May 7

🚀 Now MinerU 3.1.0 Full integration with MinerU2.5-Pro ✅ Apache 2.0-based license (more business-friendly) ✅ Native DOCX/PPTX/XLSX parsing ✅ 8GB RAM handles 10k pages ✅ Multi-machine/multi-card, high concurrency From a tool to industrial-grade parsing infrastructure #MinerU

133

OpenDataLab

OpenDataLab @OpenDataLab_AI

May 7

● GitHub: github.com/opendatalab/Miner… ● HuggingFace: huggingface.co/spaces/openda… ● ModelScope: modelscope.cn/models/OpenDat…

GitHub - opendatalab/MinerU: Transforms complex documents like PDFs and Office docs into LLM-ready...

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows. - opendatalab/MinerU

github.com

OpenDataLab

OpenDataLab @OpenDataLab_AI

May 7

🚀 Introducing MinerU Document Explorer! Inspired by Karpathy’s LLM Wiki vision — build your dynamic, self-maintaining AI knowledge base beyond RAG. Parse PDFs/Word/PPT, auto-generate linked Wiki, precise extraction & full traceability. Lightweight, local, agent-native.

135

OpenDataLab

OpenDataLab @OpenDataLab_AI

May 7

👉 GitHub: github.com/opendatalab/Miner…

GitHub - opendatalab/MinerU-Document-Explorer: Agent-native knowledge engine with MCP tools for...

Agent-native knowledge engine with MCP tools for document indexing, wiki organization, fast retrieval and deep reading across PDF/DOCX/PPTX/Markdown - opendatalab/MinerU-Document-Explorer

github.com

OpenDataLab

OpenDataLab @OpenDataLab_AI

May 7

🚀MinerU2.5-Pro! Same 1.2B architecture, NO structural changes. 65.5M pages, diversity-aware sampling, cross-model verification, render-then-verify correction. Scores 95.69 on OmniDocBench v1.6, outperforming larger models. Built for real-world use Now open-sourced! #MinerU

146

Tiezhen WANG

OpenDataLab retweeted

Tiezhen WANG

@Xianbao_QIAN

Apr 11

MinerU just got upgraded. Welcome 2.5 pro!

2,961

karminski-牙医

OpenDataLab retweeted

karminski-牙医

@karminski3

Feb 25

结果就是在 OmniDocBench v1.5 上, 0.9B 参数 2.5K 视觉 token 就拿下了 92.62 的综合分, 文本/公式/表格/阅读顺序四项全部拿到了 SOTA 的成绩. 超过了 Gemini-2.5 Pro (88.03), Qwen2.5-VL-72B (87.02), MinerU2.5 (90.67). 推理速度也比其他的模型快, 而且显存占用更少. 论文证明了观点：在大模型时代, 精巧的工程设计高质量数据可以让小模型达到甚至超越大模型的水平. 论文对整个 OCR 领域都有很大的启发意义👍.

2,047

OpenDataLab

OpenDataLab @OpenDataLab_AI

Apr 1

Struggling with messy receipts & invoices? 🧾 #MinerU Skills delivers zero-code receipt parsing. Automatically locate, split & extract key data—amounts, dates, items—with high accuracy. Structured output ready for storage & reconciliation. Full workflow tutorial video now live!

0:32

105

OpenDataLab

OpenDataLab @OpenDataLab_AI

Apr 1

Tired of messy PDF outputs? 📄 #MinerU Skills let you process papers with zero code. Parse layouts, formulas, tables & OCR in one click. Batch 50 papers to clean Markdown, keep LaTeX & tables perfectly. Connect to RAG/knowledge base effortlessly. Full tutorial video out now!

0:36

104

OpenDataLab

OpenDataLab @OpenDataLab_AI

Mar 31

🚀#OpenDataLab’s AI-ready database #Sciverse Launched! Powering #AGI4S with a 3-layer system (Sci-Base/Sci-Align/Sci-Evo). ✅25M parsed literatures, 600B high-quality tokens via ＃MinerU ✅18M protein sequences, 6M chemical reactions 👉Explore: sciverse.opendatalab.com

OpenDataLab

OpenDataLab @OpenDataLab_AI

Mar 27

🚀New from #OpenDataLab: MinerU-Diffusion! We redefine document OCR as inverse rendering via diffusion decoding, replacing slow autoregressive generation. ✅ Up to 5.1× faster inference ✅ Stronger visual structure modeling ✅ Stable in challenging scenarios Try it & star us!

0:22

OpenDataLab

OpenDataLab @OpenDataLab_AI

Mar 27

Paper: huggingface.co/papers/2603.2… GitHub: github.com/opendatalab/Miner…

OpenDataLab

OpenDataLab @OpenDataLab_AI

Mar 23

Free open-source PDF parser MinerU now offers Skills, MCP Server, dual-mode API, cross-platform CLI/SDK, and RAG plugins. One sentence lets AI read PDFs easily! 🦞MinerU skills clawhub.ai/MinerU-Extract/mi… 👉Dev-ecosystem Usage Guide mineru.net/ecosystem

OpenDataLab

OpenDataLab @OpenDataLab_AI

Mar 19

🚀Big Update! MinerU has adapted to 10 computing power platforms 💯99% accuracy in capturing PDF/web elements 💪OmniDocBench adopted by Gemini3/DeepSeek as authoritativee benchmark 👉Explore: github.com/opendatalab/miner…; mineru.net/OpenSourceTools/E… 🏆MDIC: mineru.net/MDIC2026

OpenDataLab

OpenDataLab @OpenDataLab_AI

Mar 19

🚀2026 MinerU Data Intelligence Challenge is LIVE! 🌐Rooted in AGl4S with 3 competitive tracks. 🏆Win 2M RMB rewards (1M cash 1M computing power). Present at WAIC 2026. 🔥Conquer unstructured data challenges now! 👉Sign up: mineru.net/MDIC2026

NodeShift

OpenDataLab retweeted

NodeShift @nodeshiftai

3 Oct 2025

MinerU2.5 is a compact 1.2B VLM with a smart two-stage, coarse-to-fine pipeline (global layout → native-res crops) that delivers state-of-the-art doc parsing with low compute.

313

meng shao

OpenDataLab retweeted

meng shao

@shao__meng

29 Sep 2025

MinerU2.5 正式发布 🎉，这个参数规模仅 1.2B 的视觉-语言模型，通过创新的解耦架构和数据引擎，实现 SOTA 准确率，同时显著降低计算开销！！团队也公布了技术报告，一起看看它的模型组成、训练细节和实战表现 👇 1. 背景与挑战文档不同于自然图像，具有高分辨率（常超数千像素）、内容密集（文本密集）和结构复杂（多列布局、跨页元素）等特性。这些导致传统 OCR 系统面临三大难题： · 分辨率需求：需原生分辨率处理以捕捉细粒度细节，但全图编码产生 O(N²) 复杂度的高 token 冗余。 · 效率与鲁棒性：长文档易引发 VLM 幻觉，参数效率低，处理慢。 · 数据瓶颈：现有数据集多样性不足、样本不均衡、标注质量参差。现有方法分两类： · 传统管道式（e.g., Marker, MinerU）：模块化分解（布局检测→阅读顺序→内容识别），可解释但易误差传播，维护复杂。 · 端到端VLM（e.g., GOT, Qwen2.5-VL）：语义强但高分辨率下效率低，易在长文档中幻觉。 MinerU2.5 针对这些痛点，提出解耦策略，结合管道效率与 VLM 准确性。 2. 核心方法 MinerU2.5 的核心是粗到细的两阶段解析策略，将全局布局分析与局部内容识别解耦，避免全图高成本编码。模型架构基于 Qwen2-VL 框架，包括： · 视觉编码器：675M 参数的 NaViT（Native-Resolution ViT），支持动态分辨率和 2D-RoPE 位置编码，适应裁剪区域的任意宽高比。 · 语言解码器：0.5B 参数的 Qwen2-Instruct，替换为 M-RoPE 以提升多分辨率泛化。 · Patch Merger：像素重排（pixel-unshuffle）合并相邻视觉t oken，平衡效率与性能。两阶段解析流程： · 阶段I：布局分析 - 在下采样图像（e.g., 1036px）上快速全局检测元素边界、类型（文本/表格/公式/图像）和阅读顺序。输出结构化提示（如 <|box_start|> 坐标 <|ref_start|> 类型 <|ref_end|>），计算成本低。 · 阶段II：内容识别 - 基于布局结果，从原高分辨率图像裁剪关键区域（e.g., 1715px×154px），并行解码文本、公式和表格。使用专用提示，保留细粒细节，避免冗余。训练配方（三阶段）： · 阶段0：模态对齐 - 使用图像-文本对预训练视觉-语言融合。 · 阶段1：文档解析预训练 - 大规模语料覆盖布局、OCR、公式/表格识别。 · 阶段2：微调 - 任务特定优化，融入数据增强（如旋转、噪声）提升鲁棒性。数据引擎（创新亮点）：闭环系统生成多样化语料。 · 工作流： curation（收集 PDF /扫描件）→预训练/微调数据集构建→任务重构（布局/公式/表格增强标注）。 · 关键技术：迭代挖掘（inference consistency），通过模型自一致性过滤高质量样本；针对公式（混合中英）、表格（无边框/旋转）生成合成数据。 · 规模：数百万级语料，确保覆盖学术/金融/教科书等文档类型。部署上，支持 Markdown 输出，便于下游集成；推理效率比端到端 VLM 高10倍。 3. 实验评测结果在全文档解析基准 OmniDocBench 上，MinerU2.5 刷新记录： · 整体性能：1-Edit 得分 95 ，超越通用 VLM（如 Gemini-2.5 Pro 90 、Qwen2.5-VL-72B 92 ）和领域模型（如 MonkeyOCR 88 、PP-StructureV3 85 ）。 · 元素级：文本块98%、公式97%、表格96%、阅读顺序95%；TEDS/CDM 指标领先5-10%。 · 子任务：布局分析（DocLayNet mAP 85 ）、表格（PubTabNet TEDS 95 ）、公式（Marmot 准确率92 ），均 SOTA。 · 效率：1.2B 参数下，处理高分辨率文档只需秒级，远低于 72B 模型。定性示例展示其在复杂 PDF（如多列学术文、无边框表）上的优势，优于前版 MinerU 和竞品。 4. 意义与展望 MinerU2.5 以轻量设计桥接效率与准确，特别适用于高密度文档场景，推动 OCR 向实用化演进。其解耦范式可扩展至其他多模态任务，数据引擎则为数据稀缺领域提供范例。未来可探索更强 LM 集成或实时部署。开源代码和模型便于复现与迭代。

149

15,792

s3nh

OpenDataLab retweeted

s3nh

@s3nhxx

5 Oct 2025

testing MineU, 1.2B VLM for 'efficient' document parsing. its not heavy, im really optimistic. huggingface.co/opendatalab/M…

opendatalab/MinerU2.5-2509-1.2B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

322

OpenDataLab

OpenDataLab retweeted

OpenDataLab @OpenDataLab_AI

30 Sep 2025

🚀 The MinerU2.5 Technical Report is officially released!

Our OpenDataLab team is excited to share the report, which details our work on a solution that balances high efficiency with strong performance. Here's a look at our results for your reference and for discussion! 💖
📝 Our Core Idea: The Balance Between Efficiency and Precision
Lightweight Exploration: Our model is only 1.2B parameters. In an era of massive models, we wanted to explore the potential of smaller models on high-resolution tasks.
Engineering Efficiency Breakthrough: MinerU2.5 uses a unique decoupled architecture to effectively reduce computational redundancy. In our tests, its document page throughput is over 4x faster than similar models (like MonkeyOCR-Pro-3B), greatly improving deployment efficiency and cost-effectiveness!
🔥 Data Comparison with Top-Tier Models On standard benchmarks like OmniDocBench, we compared MinerU2.5 against general and specialized models to test its stability and robustness:
Performance: MinerU2.5 delivers solid, competitive performance a

ALT Our OpenDataLab team is excited to share the report, which details our work on a solution that balances high efficiency with strong performance. Here's a look at our results for your reference and for discussion! 💖 📝 Our Core Idea: The Balance Between Efficiency and Precision Lightweight Exploration: Our model is only 1.2B parameters. In an era of massive models, we wanted to explore the potential of smaller models on high-resolution tasks. Engineering Efficiency Breakthrough: MinerU2.5 uses a unique decoupled architecture to effectively reduce computational redundancy. In our tests, its document page throughput is over 4x faster than similar models (like MonkeyOCR-Pro-3B), greatly improving deployment efficiency and cost-effectiveness! 🔥 Data Comparison with Top-Tier Models On standard benchmarks like OmniDocBench, we compared MinerU2.5 against general and specialized models to test its stability and robustness: Performance: MinerU2.5 delivers solid, competitive performance a

314