by gpetraroli
Extract, search, and retrieve comprehensive metadata from PDF files with customizable options, async processing, and built‑in file size protection.
Mcp Pdf Reader provides a Model Context Protocol (MCP) server that enables applications to extract raw text, perform advanced searches, and pull detailed metadata from PDF documents. It is designed for developers who need reliable PDF processing within chat‑based or automated workflows.
npm install
to resolve dependencies.node index.js
in the project root) and configure your MCP client to point to the server executable.read-pdf
for text extraction.search-pdf
for searching specific terms.pdf-metadata
for retrieving only metadata.
Each tool accepts parameters such as file path, page ranges, case‑sensitivity, and cleaning options.Q: What is the maximum PDF size supported? A: Files larger than 50 MB are rejected to protect the server from excessive memory use.
Q: Can I extract text from scanned or image‑based PDFs? A: Not yet. OCR support is planned for a future release.
Q: How do I include or exclude metadata during extraction?
A: Set the include_metadata
boolean flag in the read-pdf
request payload (default is true
).
Q: Is there support for password‑protected PDFs? A: Password handling is on the roadmap; currently only unprotected PDFs are processed.
Q: How can I run the server from a different directory?
A: Provide the absolute path to the index.js
entry point in your MCP client configuration, as shown in the README example.
A comprehensive Model Context Protocol (MCP) server that provides advanced PDF text extraction, search, and analysis functionality.
npm install
read-pdf
- Enhanced PDF ReadingExtract text from PDF files with customizable options.
Parameters:
file
(string, required): Path to the PDF filepages
(string, optional): Page range (e.g., '1-5', '1,3,5', 'all'). Default: 'all'include_metadata
(boolean, optional): Include PDF metadata. Default: trueclean_text
(boolean, optional): Clean and normalize text. Default: falseExample Usage:
// Basic extraction
{ "file": "/path/to/document.pdf" }
// Extract with clean text and no metadata
{
"file": "/path/to/document.pdf",
"clean_text": true,
"include_metadata": false
}
search-pdf
- Search Within PDFsSearch for specific text within PDF documents.
Parameters:
file
(string, required): Path to the PDF filequery
(string, required): Text to search forcase_sensitive
(boolean, optional): Case sensitive search. Default: falsewhole_word
(boolean, optional): Match whole words only. Default: falseExample Usage:
// Case-insensitive search
{ "file": "/path/to/document.pdf", "query": "important term" }
// Whole word, case-sensitive search
{
"file": "/path/to/document.pdf",
"query": "API",
"case_sensitive": true,
"whole_word": true
}
pdf-metadata
- Extract Metadata OnlyGet comprehensive metadata from PDF files without extracting text.
Parameters:
file
(string, required): Path to the PDF fileReturns:
Add to your Cursor settings:
{
"mcpServers": {
"mcp-gp-pdf-reader": {
"command": "node",
"args": ["/absolute/path/to/mcp_gp_pdf_reader/index.js"]
}
}
}
# Via MCP client
"Extract all text from /documents/report.pdf"
# Via MCP client
"Search for 'quarterly results' in /documents/financial-report.pdf"
# Via MCP client
"Get metadata from /documents/contract.pdf"
This MCP server is designed to be extensible. Key areas for contribution:
MIT License
Please log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by zed-industries
A high‑performance, multiplayer code editor designed for speed and collaboration.
by modelcontextprotocol
Model Context Protocol Servers
by modelcontextprotocol
A Model Context Protocol server for Git repository interaction and automation.
by modelcontextprotocol
A Model Context Protocol server that provides time and timezone conversion capabilities.
by cline
An autonomous coding assistant that can create and edit files, execute terminal commands, and interact with a browser directly from your IDE, operating step‑by‑step with explicit user permission.
by continuedev
Enables faster shipping of code by integrating continuous AI agents across IDEs, terminals, and CI pipelines, offering chat, edit, autocomplete, and customizable agent workflows.
by upstash
Provides up-to-date, version‑specific library documentation and code examples directly inside LLM prompts, eliminating outdated information and hallucinated APIs.
by github
Connects AI tools directly to GitHub, enabling natural‑language interactions for repository browsing, issue and pull‑request management, CI/CD monitoring, code‑security analysis, and team collaboration.
by daytonaio
Provides a secure, elastic infrastructure that creates isolated sandboxes for running AI‑generated code with sub‑90 ms startup, unlimited persistence, and OCI/Docker compatibility.