Skip to main content

Prerequisites

Option A: Sync extraction (small PDFs)

For PDFs under 10 pages, use the sync endpoint. It waits up to 60 seconds and returns results directly.
curl -X POST https://app.okrapdf.com/api/v1/extract/sync \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/invoice.pdf"}'
Response (200):
{
  "job_id": "ocr-abc123",
  "status": "completed",
  "filename": "invoice.pdf",
  "pages": 3,
  "results": {
    "tables": [
      {
        "page": 1,
        "table_index": 0,
        "markdown": "| Item | Qty | Price |\n|------|-----|-------|\n| Widget | 10 | $5.00 |",
        "headers": ["Item", "Qty", "Price"],
        "row_count": 1
      }
    ],
    "text": [
      {"page": 1, "content": "Invoice #12345\nDate: 2025-01-15..."}
    ]
  }
}

Option B: Async extraction (any size)

For larger PDFs, use the async flow: submit, poll, fetch results.

1. Submit

curl -X POST https://app.okrapdf.com/api/v1/extract \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/annual-report.pdf"}'
Response (202):
{
  "job_id": "ocr-xyz789",
  "status": "queued",
  "filename": "annual-report.pdf",
  "poll_url": "/api/v1/jobs/ocr-xyz789",
  "results_url": "/api/v1/jobs/ocr-xyz789/results",
  "viewer_url": "https://app.okrapdf.com/ocr/ocr-xyz789"
}

2. Poll

curl https://app.okrapdf.com/api/v1/jobs/ocr-xyz789 \
  -H "Authorization: Bearer okra_YOUR_KEY"
Response:
{
  "job_id": "ocr-xyz789",
  "status": "processing",
  "pages_completed": 12,
  "total_pages": 48,
  "progress_percent": 25,
  "viewer_url": "https://app.okrapdf.com/ocr/ocr-xyz789"
}

3. Get results

Once status is completed:
curl https://app.okrapdf.com/api/v1/jobs/ocr-xyz789/results \
  -H "Authorization: Bearer okra_YOUR_KEY"

Python example

import requests
import time
import os

API_KEY = os.environ["OKRA_API_KEY"]
BASE = "https://app.okrapdf.com/api/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Submit
resp = requests.post(
    f"{BASE}/extract",
    headers=headers,
    json={"url": "https://example.com/report.pdf"},
)
job_id = resp.json()["job_id"]

# Poll
while True:
    status = requests.get(f"{BASE}/jobs/{job_id}", headers=headers).json()
    if status["status"] in ("completed", "failed"):
        break
    print(f"Progress: {status.get('progress_percent', 0)}%")
    time.sleep(2)

# Results
results = requests.get(f"{BASE}/jobs/{job_id}/results", headers=headers).json()

for table in results["results"]["tables"]:
    print(f"Page {table['page']}: {table['row_count']} rows")
    print(table["markdown"])

JavaScript example

const API_KEY = process.env.OKRA_API_KEY;
const BASE = "https://app.okrapdf.com/api/v1";
const headers = { Authorization: `Bearer ${API_KEY}` };

// Submit
const { job_id } = await fetch(`${BASE}/extract`, {
  method: "POST",
  headers: { ...headers, "Content-Type": "application/json" },
  body: JSON.stringify({ url: "https://example.com/report.pdf" }),
}).then(r => r.json());

// Poll
let status;
do {
  await new Promise(r => setTimeout(r, 2000));
  status = await fetch(`${BASE}/jobs/${job_id}`, { headers }).then(r => r.json());
} while (!["completed", "failed"].includes(status.status));

// Results
const { results } = await fetch(`${BASE}/jobs/${job_id}/results`, { headers }).then(r => r.json());
console.log(`Extracted ${results.tables.length} tables`);