Skip to main content

Overview

OkraPDF exposes a remote MCP server that gives any MCP-compatible AI agent (Claude Code, Cursor, Windsurf, etc.) direct access to your documents. Upload PDFs, read extracted content, ask questions, and extract structured data — all from your editor. No npm packages. No API keys in config files. One command to connect.
claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp

Setup

claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp
The server uses OAuth — you’ll authenticate in the browser on first use.

Tools

The server exposes 6 tools:
ToolDescription
upload_documentUpload a PDF from URL. Optionally wait for extraction to complete.
get_document_statusCheck processing phase, page count, and node count.
list_documentsList your uploaded documents with status and timestamps.
read_documentGet extracted markdown content. Supports page ranges for large docs.
ask_documentAsk a natural language question. Returns an answer with page citations.
extract_dataExtract structured JSON from a document using a prompt and JSON schema.

Quick Start

Upload and ask

> Upload this 10-K and tell me the total revenue
The agent calls upload_document with the URL, waits for extraction, then calls ask_document with your question.

Read specific pages

> Read pages 40-45 of doc-abc123
Returns extracted markdown for just those pages — useful for large filings where you know where to look.

Extract structured data

> Extract all line items from this invoice as JSON
The agent calls extract_data with your prompt and a JSON schema, returning parsed structured data.

Parallel Queries

MCP clients that support parallel tool calls (like Claude Code) can ask the same question across multiple documents simultaneously. This is the fastest way to compare data across filings.

Example: Operating margins across 3 companies

> What was the operating margin for PepsiCo, Amazon, and AMD?
The agent fires 3 ask_document calls in parallel — one per document — and results come back at the same time:
CompanyFYOperating Margin
PepsiCo202213.3%
Amazon20195.2%
AMD20225.4%
Each answer includes page citations back to the source filing. No sequential waiting — all three run concurrently against their respective Durable Objects.

How it works

Each document lives in its own Durable Object on Cloudflare’s edge. Parallel queries hit separate DOs — there’s no shared bottleneck.

Try It

# 1. Connect (one-time)
claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp

# 2. Upload a PDF
> "Upload https://arxiv.org/pdf/2401.04088"
#  → upload_document(url, wait=true)
#  → doc-25725cc7... complete, 853 nodes extracted

# 3. Ask a question
> "What is this paper about?"
#  → ask_document(doc-25725cc7, "What is this paper about?")
#  → "Mixtral 8x7B, a sparse mixture of experts model..." (p.1)

# 4. Parallel queries — the killer feature
> "Compare operating margins for PepsiCo, Amazon, and AMD"
#
#  Agent fires 3 calls at once:
#  ┌─ ask_document(PepsiCo)  ──→  13.3%  (FY2022, p.40)
#  ├─ ask_document(Amazon)   ──→   5.2%  (FY2019, p.38)
#  └─ ask_document(AMD)      ──→   5.4%  (FY2022, p.48)
#
#  Each doc is its own Durable Object — no shared bottleneck.
#  All three return at the same time.

# 5. Structured extraction
> "Extract revenue by segment as JSON"
#  → extract_data(doc, prompt, json_schema)
#  → { "segments": [{ "name": "PBNA", "revenue": 26213 }, ...] }

Tool Reference

upload_document

ParameterTypeRequiredDescription
urlstringYesPublic URL of the PDF
document_idstringNoCustom ID (auto-generated if omitted)
waitbooleanNoWait for extraction to complete (default: true)
page_images"none" | "cover" | "lazy"NoPage image strategy (default: "cover")

read_document

ParameterTypeRequiredDescription
document_idstringYesDocument ID
pagesstringNoPage range, e.g. "1-5" or "3". Omit for all pages.

ask_document

ParameterTypeRequiredDescription
document_idstringYesDocument ID
questionstringYesNatural language question

extract_data

ParameterTypeRequiredDescription
document_idstringYesDocument ID
promptstringYesExtraction instruction
json_schemaobjectYesJSON Schema for desired output shape

get_document_status

ParameterTypeRequiredDescription
document_idstringYesDocument ID

list_documents

ParameterTypeRequiredDescription
limitintegerNoMax documents to return (default: 20, max: 100)