Overview
OkraPDF exposes a remote MCP server that gives any MCP-compatible AI agent (Claude Code, Cursor, Windsurf, etc.) direct access to your documents. Upload PDFs, read extracted content, ask questions, and extract structured data — all from your editor. No npm packages. No API keys in config files. One command to connect.Setup
- Claude Code
- Cursor / Windsurf
Tools
The server exposes 6 tools:| Tool | Description |
|---|---|
upload_document | Upload a PDF from URL. Optionally wait for extraction to complete. |
get_document_status | Check processing phase, page count, and node count. |
list_documents | List your uploaded documents with status and timestamps. |
read_document | Get extracted markdown content. Supports page ranges for large docs. |
ask_document | Ask a natural language question. Returns an answer with page citations. |
extract_data | Extract structured JSON from a document using a prompt and JSON schema. |
Quick Start
Upload and ask
upload_document with the URL, waits for extraction, then calls ask_document with your question.
Read specific pages
Extract structured data
extract_data with your prompt and a JSON schema, returning parsed structured data.
Parallel Queries
MCP clients that support parallel tool calls (like Claude Code) can ask the same question across multiple documents simultaneously. This is the fastest way to compare data across filings.Example: Operating margins across 3 companies
ask_document calls in parallel — one per document — and results come back at the same time:
| Company | FY | Operating Margin |
|---|---|---|
| PepsiCo | 2022 | 13.3% |
| Amazon | 2019 | 5.2% |
| AMD | 2022 | 5.4% |
How it works
Each document lives in its own Durable Object on Cloudflare’s edge. Parallel queries hit separate DOs — there’s no shared bottleneck.Try It
Tool Reference
upload_document
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Public URL of the PDF |
document_id | string | No | Custom ID (auto-generated if omitted) |
wait | boolean | No | Wait for extraction to complete (default: true) |
page_images | "none" | "cover" | "lazy" | No | Page image strategy (default: "cover") |
read_document
| Parameter | Type | Required | Description |
|---|---|---|---|
document_id | string | Yes | Document ID |
pages | string | No | Page range, e.g. "1-5" or "3". Omit for all pages. |
ask_document
| Parameter | Type | Required | Description |
|---|---|---|---|
document_id | string | Yes | Document ID |
question | string | Yes | Natural language question |
extract_data
| Parameter | Type | Required | Description |
|---|---|---|---|
document_id | string | Yes | Document ID |
prompt | string | Yes | Extraction instruction |
json_schema | object | Yes | JSON Schema for desired output shape |
get_document_status
| Parameter | Type | Required | Description |
|---|---|---|---|
document_id | string | Yes | Document ID |
list_documents
| Parameter | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Max documents to return (default: 20, max: 100) |