NameetP/pdfmux

PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback). Per-page confidence scoring flags low-quality pages and auto-reextracts them — prevents silent RAG failures. Zero config: `pip install pdfmux`. MIT licensed.

Category
Search & Data Extraction
Language
Python
License
MIT
Stars
57
Source
https://github.com/NameetP/pdfmux

Related MCP Servers

Compare