NameetP/pdfmux
PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback). Per-page confidence scoring flags low-quality pages and auto-reextracts them — prevents silent RAG failures. Zero config: `pip install pdfmux`. MIT licensed.
- Category
- Search & Data Extraction
- Language
- Python
- License
- MIT
- Stars
- 57
- Source
- https://github.com/NameetP/pdfmux