Skip to main content

Overview

Use POST /v1/documents/ingest when parsing already happened in your own pipeline. You send vendor output (unstructured, llamaparse, or canonical) and OkraPDF handles normalization, hydration, lifecycle processing, and document endpoints.

Request

curl -X POST https://api.okrapdf.com/v1/documents/ingest \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vendor": "unstructured",
    "data": [
      {
        "type": "NarrativeText",
        "text": "Invoice total due is $12,480",
        "metadata": { "page_number": 1 }
      }
    ],
    "pdfUrl": "https://example.com/invoice.pdf"
  }'

Supported connector IDs

vendor valueExpected shape
unstructuredarray of Unstructured elements (type, metadata.page_number)
llamaparseobject with pages[].items[] entries
canonicalobject with canonical pages[].blocks[]
If vendor is omitted, OkraPDF tries to auto-detect from payload shape.

Response model

The endpoint returns 202 Accepted and starts lifecycle processing.
{
  "documentId": "doc-...",
  "phase": "ingesting",
  "status": "processing",
  "vendor": "unstructured",
  "pageCount": 12,
  "workflowId": "...",
  "urls": {
    "self": "https://api.okrapdf.com/document/doc-...",
    "status": "https://api.okrapdf.com/document/doc-.../status",
    "pages": "https://api.okrapdf.com/document/doc-.../pages",
    "publish": "https://api.okrapdf.com/document/doc-.../publish"
  }
}

What happens after ingest

  1. Vendor payload is normalized to Okra’s canonical parse shape.
  2. Parsed nodes are hydrated into the document graph.
  3. Lifecycle jobs run (snapshot/materialization/projection workflow).
  4. Standard document surfaces become available (pages, chat/completion, output profiles, URL builder).

Failure modes

  • Unknown payload shape without vendor: 422 with supported connector list.
  • Invalid payload for chosen connector: 422 normalization error.
  • Workflow startup failure: 500 with error payload.
No silent drops: payloads are validated before lifecycle continues.

When to use this endpoint

Use Ingest API when you:
  • already run extraction with external vendors,
  • want OkraPDF delivery + policy + output layers,
  • need a stable doc-... lifecycle without re-running OCR in Okra.