Ingest API

Overview

Use POST /v1/documents/ingest when parsing already happened in your own pipeline. You send vendor output (unstructured, llamaparse, or canonical) and OkraPDF handles normalization, hydration, lifecycle processing, and document endpoints.

Request

curl -X POST https://api.okrapdf.com/v1/documents/ingest \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vendor": "unstructured",
    "data": [
      {
        "type": "NarrativeText",
        "text": "Invoice total due is $12,480",
        "metadata": { "page_number": 1 }
      }
    ],
    "pdfUrl": "https://example.com/invoice.pdf"
  }'

Supported connector IDs

`vendor` value	Expected shape
`unstructured`	array of Unstructured elements (`type`, `metadata.page_number`)
`llamaparse`	object with `pages[].items[]` entries
`canonical`	object with canonical `pages[].blocks[]`

If vendor is omitted, OkraPDF tries to auto-detect from payload shape.

Response model

The endpoint returns 202 Accepted and starts lifecycle processing.

{
  "documentId": "doc-...",
  "phase": "ingesting",
  "status": "processing",
  "vendor": "unstructured",
  "pageCount": 12,
  "workflowId": "...",
  "urls": {
    "self": "https://api.okrapdf.com/document/doc-...",
    "status": "https://api.okrapdf.com/document/doc-.../status",
    "pages": "https://api.okrapdf.com/document/doc-.../pages",
    "publish": "https://api.okrapdf.com/document/doc-.../publish"
  }
}

What happens after ingest

Vendor payload is normalized to Okra’s canonical parse shape.
Parsed nodes are hydrated into the document graph.
Lifecycle jobs run (snapshot/materialization/projection workflow).
Standard document surfaces become available (pages, chat/completion, output profiles, URL builder).

Failure modes

Unknown payload shape without vendor: 422 with supported connector list.
Invalid payload for chosen connector: 422 normalization error.
Workflow startup failure: 500 with error payload.

No silent drops: payloads are validated before lifecycle continues.

When to use this endpoint

Use Ingest API when you:

already run extraction with external vendors,
want OkraPDF delivery + policy + output layers,
need a stable doc-... lifecycle without re-running OCR in Okra.

Output Schema

Materialize reproducible structured outputs from ingested documents.

URL Builder

Build immutable URLs for pages, tables, and artifacts.

Showcase

Features

Integrations

Resources

Overview

Request

Supported connector IDs

Response model

What happens after ingest

Failure modes

When to use this endpoint

Output Schema

URL Builder

Showcase

Features

Integrations

Resources

​Overview

​Request

​Supported connector IDs

​Response model

​What happens after ingest

​Failure modes

​When to use this endpoint

​Related Docs

Output Schema

URL Builder

Overview

Request

Supported connector IDs

Response model

What happens after ingest

Failure modes

When to use this endpoint

Related Docs