MCP Server

Overview

OkraPDF exposes a remote MCP server that gives any MCP-compatible AI agent (Claude Code, Cursor, Windsurf, etc.) direct access to your documents. Upload PDFs, read extracted content, ask questions, and extract structured data — all from your editor. No npm packages. No API keys in config files. One command to connect.

claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp

Setup

Claude Code
Cursor / Windsurf

claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp

The server uses OAuth — you’ll authenticate in the browser on first use.

Add to your MCP config (.cursor/mcp.json or equivalent):

{
  "mcpServers": {
    "okrapdf": {
      "type": "http",
      "url": "https://api.okrapdf.com/mcp"
    }
  }
}

Tools

The server exposes 6 tools:

Tool	Description
`upload_document`	Upload a PDF from URL. Optionally wait for extraction to complete.
`get_document_status`	Check processing phase, page count, and node count.
`list_documents`	List your uploaded documents with status and timestamps.
`read_document`	Get extracted markdown content. Supports page ranges for large docs.
`ask_document`	Ask a natural language question. Returns an answer with page citations.
`extract_data`	Extract structured JSON from a document using a prompt and JSON schema.

Quick Start

Upload and ask

> Upload this 10-K and tell me the total revenue

The agent calls upload_document with the URL, waits for extraction, then calls ask_document with your question.

Read specific pages

> Read pages 40-45 of doc-abc123

Returns extracted markdown for just those pages — useful for large filings where you know where to look.

Extract structured data

> Extract all line items from this invoice as JSON

The agent calls extract_data with your prompt and a JSON schema, returning parsed structured data.

Parallel Queries

MCP clients that support parallel tool calls (like Claude Code) can ask the same question across multiple documents simultaneously. This is the fastest way to compare data across filings.

Example: Operating margins across 3 companies

> What was the operating margin for PepsiCo, Amazon, and AMD?

The agent fires 3 ask_document calls in parallel — one per document — and results come back at the same time:

Company	FY	Operating Margin
PepsiCo	2022	13.3%
Amazon	2019	5.2%
AMD	2022	5.4%

Each answer includes page citations back to the source filing. No sequential waiting — all three run concurrently against their respective Durable Objects.

How it works

Each document lives in its own Durable Object on Cloudflare’s edge. Parallel queries hit separate DOs — there’s no shared bottleneck.

Try It

# 1. Connect (one-time)
claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp

# 2. Upload a PDF
> "Upload https://arxiv.org/pdf/2401.04088"
#  → upload_document(url, wait=true)
#  → doc-25725cc7... complete, 853 nodes extracted

# 3. Ask a question
> "What is this paper about?"
#  → ask_document(doc-25725cc7, "What is this paper about?")
#  → "Mixtral 8x7B, a sparse mixture of experts model..." (p.1)

# 4. Parallel queries — the killer feature
> "Compare operating margins for PepsiCo, Amazon, and AMD"
#
#  Agent fires 3 calls at once:
#  ┌─ ask_document(PepsiCo)  ──→  13.3%  (FY2022, p.40)
#  ├─ ask_document(Amazon)   ──→   5.2%  (FY2019, p.38)
#  └─ ask_document(AMD)      ──→   5.4%  (FY2022, p.48)
#
#  Each doc is its own Durable Object — no shared bottleneck.
#  All three return at the same time.

# 5. Structured extraction
> "Extract revenue by segment as JSON"
#  → extract_data(doc, prompt, json_schema)
#  → { "segments": [{ "name": "PBNA", "revenue": 26213 }, ...] }

Tool Reference

upload_document

Parameter	Type	Required	Description
`url`	string	Yes	Public URL of the PDF
`document_id`	string	No	Custom ID (auto-generated if omitted)
`wait`	boolean	No	Wait for extraction to complete (default: `true`)
`page_images`	`"none"` \| `"cover"` \| `"lazy"`	No	Page image strategy (default: `"cover"`)

read_document

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID
`pages`	string	No	Page range, e.g. `"1-5"` or `"3"`. Omit for all pages.

ask_document

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID
`question`	string	Yes	Natural language question

extract_data

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID
`prompt`	string	Yes	Extraction instruction
`json_schema`	object	Yes	JSON Schema for desired output shape

get_document_status

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID

list_documents

Parameter	Type	Required	Description
`limit`	integer	No	Max documents to return (default: 20, max: 100)

Showcase

Features

Integrations

Resources

Overview

Setup

Tools

Quick Start

Upload and ask

Read specific pages

Extract structured data

Parallel Queries

Example: Operating margins across 3 companies

How it works

Try It

Tool Reference

upload_document

read_document

ask_document

extract_data

get_document_status

list_documents

Showcase

Features

Integrations

Resources

​Overview

​Setup

​Tools

​Quick Start

​Upload and ask

​Read specific pages

​Extract structured data

​Parallel Queries

​Example: Operating margins across 3 companies

​How it works

​Try It

​Tool Reference

​upload_document

​read_document

​ask_document

​extract_data

​get_document_status

​list_documents

Overview

Setup

Tools

Quick Start

Upload and ask

Read specific pages

Extract structured data

Parallel Queries

Example: Operating margins across 3 companies

How it works

Try It

Tool Reference

upload_document

read_document

ask_document

extract_data

get_document_status

list_documents