Prerequisites
- An OkraPDF account (sign up)
- An API key (create one)
Option A: Sync extraction (small PDFs)
For PDFs under 10 pages, use the sync endpoint. It waits up to 60 seconds and returns results directly.Copy
curl -X POST https://app.okrapdf.com/api/v1/extract/sync \
-H "Authorization: Bearer okra_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/invoice.pdf"}'
Copy
{
"job_id": "ocr-abc123",
"status": "completed",
"filename": "invoice.pdf",
"pages": 3,
"results": {
"tables": [
{
"page": 1,
"table_index": 0,
"markdown": "| Item | Qty | Price |\n|------|-----|-------|\n| Widget | 10 | $5.00 |",
"headers": ["Item", "Qty", "Price"],
"row_count": 1
}
],
"text": [
{"page": 1, "content": "Invoice #12345\nDate: 2025-01-15..."}
]
}
}
Option B: Async extraction (any size)
For larger PDFs, use the async flow: submit, poll, fetch results.1. Submit
Copy
curl -X POST https://app.okrapdf.com/api/v1/extract \
-H "Authorization: Bearer okra_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/annual-report.pdf"}'
Copy
{
"job_id": "ocr-xyz789",
"status": "queued",
"filename": "annual-report.pdf",
"poll_url": "/api/v1/jobs/ocr-xyz789",
"results_url": "/api/v1/jobs/ocr-xyz789/results",
"viewer_url": "https://app.okrapdf.com/ocr/ocr-xyz789"
}
2. Poll
Copy
curl https://app.okrapdf.com/api/v1/jobs/ocr-xyz789 \
-H "Authorization: Bearer okra_YOUR_KEY"
Copy
{
"job_id": "ocr-xyz789",
"status": "processing",
"pages_completed": 12,
"total_pages": 48,
"progress_percent": 25,
"viewer_url": "https://app.okrapdf.com/ocr/ocr-xyz789"
}
3. Get results
Oncestatus is completed:
Copy
curl https://app.okrapdf.com/api/v1/jobs/ocr-xyz789/results \
-H "Authorization: Bearer okra_YOUR_KEY"
Python example
Copy
import requests
import time
import os
API_KEY = os.environ["OKRA_API_KEY"]
BASE = "https://app.okrapdf.com/api/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Submit
resp = requests.post(
f"{BASE}/extract",
headers=headers,
json={"url": "https://example.com/report.pdf"},
)
job_id = resp.json()["job_id"]
# Poll
while True:
status = requests.get(f"{BASE}/jobs/{job_id}", headers=headers).json()
if status["status"] in ("completed", "failed"):
break
print(f"Progress: {status.get('progress_percent', 0)}%")
time.sleep(2)
# Results
results = requests.get(f"{BASE}/jobs/{job_id}/results", headers=headers).json()
for table in results["results"]["tables"]:
print(f"Page {table['page']}: {table['row_count']} rows")
print(table["markdown"])
JavaScript example
Copy
const API_KEY = process.env.OKRA_API_KEY;
const BASE = "https://app.okrapdf.com/api/v1";
const headers = { Authorization: `Bearer ${API_KEY}` };
// Submit
const { job_id } = await fetch(`${BASE}/extract`, {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({ url: "https://example.com/report.pdf" }),
}).then(r => r.json());
// Poll
let status;
do {
await new Promise(r => setTimeout(r, 2000));
status = await fetch(`${BASE}/jobs/${job_id}`, { headers }).then(r => r.json());
} while (!["completed", "failed"].includes(status.status));
// Results
const { results } = await fetch(`${BASE}/jobs/${job_id}/results`, { headers }).then(r => r.json());
console.log(`Extracted ${results.tables.length} tables`);