Schema Extraction

cURL

curl -X POST https://app.okrapdf.com/api/v1/jobs/ocr-abc123/schema \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": {
      "name": "invoice",
      "fields": [
        {"key": "vendor_name", "type": "string", "required": true},
        {"key": "total_amount", "type": "number", "required": true},
        {"key": "due_date", "type": "date"}
      ]
    }
  }'

{
  "job_id": "<string>",
  "run_id": "<string>",
  "status": "completed",
  "values": {},
  "fields": [
    {
      "path": "<string>",
      "type": "string",
      "value": "<unknown>",
      "confidence": 0.5,
      "citations": [
        {
          "page": 123,
          "quote": "<string>",
          "bbox": {
            "x": 123,
            "y": 123,
            "width": 123,
            "height": 123
          },
          "source": "ocr_page"
        }
      ]
    }
  ],
  "extracted_at": "2023-11-07T05:31:56Z"
}

POST

api

jobs

{jobId}

schema

cURL

curl -X POST https://app.okrapdf.com/api/v1/jobs/ocr-abc123/schema \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": {
      "name": "invoice",
      "fields": [
        {"key": "vendor_name", "type": "string", "required": true},
        {"key": "total_amount", "type": "number", "required": true},
        {"key": "due_date", "type": "date"}
      ]
    }
  }'

{
  "job_id": "<string>",
  "run_id": "<string>",
  "status": "completed",
  "values": {},
  "fields": [
    {
      "path": "<string>",
      "type": "string",
      "value": "<unknown>",
      "confidence": 0.5,
      "citations": [
        {
          "page": 123,
          "quote": "<string>",
          "bbox": {
            "x": 123,
            "y": 123,
            "width": 123,
            "height": 123
          },
          "source": "ocr_page"
        }
      ]
    }
  ],
  "extracted_at": "2023-11-07T05:31:56Z"
}

The most powerful endpoint for agents and automation. Define a typed schema with fields like vendor_name (string), total_amount (number), due_date (date) — and OkraPDF extracts structured values with confidence scores and page-level citations.The job must be completed before running schema extraction. Use GET /api/v1/jobs/{id} to check status first.

Schema field types

Type	Description	Example value
`string`	Text value	`"Acme Corporation"`
`number`	Numeric value	`1234.56`
`boolean`	True/false	`true`
`date`	Date string	`"2025-12-31"`
`array`	List of values	`["item1", "item2"]`
`object`	Nested structure	`{"name": "...", "amount": 100}`

Citation modes

best (default) — Returns the single best citation per field. Fast and concise.
all — Returns every matching citation. Use when you need to verify or cross-reference.

Example: invoice extraction

import requests

job_id = "ocr-abc123"
resp = requests.post(
    f"https://app.okrapdf.com/api/v1/jobs/{job_id}/schema",
    headers={"Authorization": "Bearer okra_YOUR_KEY"},
    json={
        "schema": {
            "name": "invoice",
            "fields": [
                {"key": "vendor_name", "type": "string", "required": True},
                {"key": "invoice_number", "type": "string", "required": True},
                {"key": "total_amount", "type": "number", "required": True},
                {"key": "line_items", "type": "array", "description": "Each line item with description and amount"},
                {"key": "due_date", "type": "date"},
            ],
        },
        "options": {"citation_mode": "best"},
    },
)

result = resp.json()

# Quick access to values
print(result["values"]["vendor_name"])    # "Acme Corp"
print(result["values"]["total_amount"])   # 1234.56

# Detailed results with citations
for field in result["fields"]:
    print(f"{field['path']}: {field['value']}")
    print(f"  confidence: {field['confidence']}")
    for cite in field["citations"]:
        print(f"  page {cite['page']}: \"{cite['quote']}\"")

Authorizations

Authorization

string

header

required

API key as Bearer token: Authorization: Bearer okra_xxx

Path Parameters

jobId

string

required

Body

application/json

schema

object

required

Show child attributes

options

object

Show child attributes

Response

Schema extraction results

job_id

string

required

run_id

string

required

status

enum<string>

required

Available options:

completed

values

object

required

Key-value map of extracted data (e.g. {"vendor_name": "Acme Corp", "total_amount": 1234.56})

fields

object[]

required

Detailed results per field with confidence and citations

Show child attributes

extracted_at

string<date-time>

Export Job Chat with Document

Overview

Extract

Jobs

Chat

Collections

Upload

Schema Extraction

Schema field types

Citation modes

Example: invoice extraction

Authorizations

Path Parameters

Body

Response

Overview

Extract

Jobs

Chat

Collections

Upload

​Schema field types

​Citation modes

​Example: invoice extraction

Authorizations

Path Parameters

Body

Response

Schema field types

Citation modes

Example: invoice extraction