👨🔧 Github: PDF-Extract-Kit, A Toolkit for High-Quality PDF Content Extraction.
Stars ⭐️
- Integrates leading document parsing models for layout detection, formula detection, formula recognition, OCR, and table recognition.
- high-quality parsing across diverse document types due to fine-tuning on varied document annotation data.
- Includes pre-trained models for layout detection, formula detection, formula recognition, OCR, and table recognition.
github. com/opendatalab/PDF-Extract-Kit