Skip to main content
POST
/
api
/
v1
/
jobs
/
{jobId}
/
schema
cURL
curl -X POST https://app.okrapdf.com/api/v1/jobs/ocr-abc123/schema \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": {
      "name": "invoice",
      "fields": [
        {"key": "vendor_name", "type": "string", "required": true},
        {"key": "total_amount", "type": "number", "required": true},
        {"key": "due_date", "type": "date"}
      ]
    }
  }'
{
  "job_id": "<string>",
  "run_id": "<string>",
  "status": "completed",
  "values": {},
  "fields": [
    {
      "path": "<string>",
      "type": "string",
      "value": "<unknown>",
      "confidence": 0.5,
      "citations": [
        {
          "page": 123,
          "quote": "<string>",
          "bbox": {
            "x": 123,
            "y": 123,
            "width": 123,
            "height": 123
          },
          "source": "ocr_page"
        }
      ]
    }
  ],
  "extracted_at": "2023-11-07T05:31:56Z"
}
The most powerful endpoint for agents and automation. Define a typed schema with fields like vendor_name (string), total_amount (number), due_date (date) — and OkraPDF extracts structured values with confidence scores and page-level citations.The job must be completed before running schema extraction. Use GET /api/v1/jobs/{id} to check status first.

Schema field types

TypeDescriptionExample value
stringText value"Acme Corporation"
numberNumeric value1234.56
booleanTrue/falsetrue
dateDate string"2025-12-31"
arrayList of values["item1", "item2"]
objectNested structure{"name": "...", "amount": 100}

Citation modes

  • best (default) — Returns the single best citation per field. Fast and concise.
  • all — Returns every matching citation. Use when you need to verify or cross-reference.

Example: invoice extraction

import requests

job_id = "ocr-abc123"
resp = requests.post(
    f"https://app.okrapdf.com/api/v1/jobs/{job_id}/schema",
    headers={"Authorization": "Bearer okra_YOUR_KEY"},
    json={
        "schema": {
            "name": "invoice",
            "fields": [
                {"key": "vendor_name", "type": "string", "required": True},
                {"key": "invoice_number", "type": "string", "required": True},
                {"key": "total_amount", "type": "number", "required": True},
                {"key": "line_items", "type": "array", "description": "Each line item with description and amount"},
                {"key": "due_date", "type": "date"},
            ],
        },
        "options": {"citation_mode": "best"},
    },
)

result = resp.json()

# Quick access to values
print(result["values"]["vendor_name"])    # "Acme Corp"
print(result["values"]["total_amount"])   # 1234.56

# Detailed results with citations
for field in result["fields"]:
    print(f"{field['path']}: {field['value']}")
    print(f"  confidence: {field['confidence']}")
    for cite in field["citations"]:
        print(f"  page {cite['page']}: \"{cite['quote']}\"")

Authorizations

Authorization
string
header
required

API key as Bearer token: Authorization: Bearer okra_xxx

Path Parameters

jobId
string
required

Body

application/json
schema
object
required
options
object

Response

Schema extraction results

job_id
string
required
run_id
string
required
status
enum<string>
required
Available options:
completed
values
object
required

Key-value map of extracted data (e.g. {"vendor_name": "Acme Corp", "total_amount": 1234.56})

fields
object[]
required

Detailed results per field with confidence and citations

extracted_at
string<date-time>