Skip to main content

Overview

This walkthrough shows the full output schema lifecycle using curl. Five steps: register the recipe, write the result, read from the DO, inspect the audit trail, and read publicly from R2.
All write operations require the x-document-agent-secret header. Public R2 reads require no authentication.

Step 1: Register an Output Profile

Define the extraction recipe — what to extract, how to extract it, which model to use.
curl -X PUT "https://api.okrapdf.com/document/{doc_id}/output-profile/invoice" \
  -H "x-document-agent-secret: $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": {
      "type": "object",
      "properties": {
        "vendor": { "type": "string" },
        "total": { "type": "number" },
        "date": { "type": "string", "format": "date" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "quantity": { "type": "number" },
              "amount": { "type": "number" }
            }
          }
        }
      }
    },
    "prompt": "Extract invoice fields including vendor, date, total, and line items.",
    "model": "claude-sonnet-4-5-20250929"
  }'
Response
{ "ok": true }
The profile is stored in the document’s Durable Object SQLite database. It’s the recipe — no extraction runs yet.

Step 2: Materialize the Output

After your SDK or agent runs the extraction against the LLM, write the validated result and audit trail.
curl -X PUT "https://api.okrapdf.com/document/{doc_id}/output/invoice" \
  -H "x-document-agent-secret: $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "data": {
      "vendor": "Acme Corp",
      "total": 1234.56,
      "date": "2026-01-15",
      "line_items": [
        { "description": "Widget", "quantity": 10, "amount": 1234.56 }
      ]
    },
    "audit": {
      "model": "claude-sonnet-4-5-20250929",
      "prompt": "Extract invoice fields including vendor, date, total, and line items.",
      "raw_response": "{\"vendor\":\"Acme Corp\",\"total\":1234.56,\"date\":\"2026-01-15\",\"line_items\":[{\"description\":\"Widget\",\"quantity\":10,\"amount\":1234.56}]}"
    }
  }'
Response
{ "ok": true }
This does two things:
  1. UPSERT into DO SQLite — source of truth with full audit trail
  2. PUT to R2public/{doc_id}/o_invoice.json with data only (no audit in public blob)

Step 3: Read via Durable Object

Authenticated read from the DO’s SQLite. Returns the validated data.
curl "https://api.okrapdf.com/document/{doc_id}/output/invoice" \
  -H "x-document-agent-secret: $SECRET"
Response
{
  "vendor": "Acme Corp",
  "total": 1234.56,
  "date": "2026-01-15",
  "line_items": [
    { "description": "Widget", "quantity": 10, "amount": 1234.56 }
  ]
}

Step 4: Inspect the Audit Trail

See exactly what produced the output: which model, what prompt, the raw LLM response before parsing.
curl "https://api.okrapdf.com/document/{doc_id}/output/invoice/audit" \
  -H "x-document-agent-secret: $SECRET"
Response
{
  "model": "claude-sonnet-4-5-20250929",
  "prompt": "Extract invoice fields including vendor, date, total, and line items.",
  "raw_response": "{\"vendor\":\"Acme Corp\",\"total\":1234.56,...}",
  "created_at": 1772173501590
}
The audit trail is only available via the authenticated DO path. It is never exposed in the public R2 blob.

Step 5: Public R2 Read

The key benefit. No API key. No Durable Object wake. Served straight from R2 with cache headers.
curl "https://api.okrapdf.com/v1/documents/{doc_id}/o_invoice/data.json"
Response
{
  "vendor": "Acme Corp",
  "total": 1234.56,
  "date": "2026-01-15",
  "line_items": [
    { "description": "Widget", "quantity": 10, "amount": 1234.56 }
  ]
}
Response headers:
  • Cache-Control: public, max-age=3600
  • Access-Control-Allow-Origin: *
  • Content-Type: application/json
Embed this URL in dashboards, spreadsheets, or downstream pipelines. It’s a static JSON file.

Combining with Transforms

The o_ prefix works alongside the existing t_ transform prefix:
# Extract via LlamaParse provider, read invoice output
curl "https://api.okrapdf.com/v1/documents/{doc_id}/t_llamaparse/o_invoice/data.json"

Multiple Output Schemas

A single document can have many output schemas. Each is independent.
# Register and materialize different extractions
PUT /document/{id}/output-profile/invoice
PUT /document/{id}/output-profile/compliance
PUT /document/{id}/output-profile/summary

# Read each independently
GET /v1/documents/{id}/o_invoice/data.json
GET /v1/documents/{id}/o_compliance/data.json
GET /v1/documents/{id}/o_summary/data.json

Upsert Behavior

Both profile registration and output materialization use upsert semantics. Re-running an extraction with updated data overwrites the previous result:
# First extraction
PUT /document/{id}/output/invoice  { "vendor": "Acme Corp", "total": 1000 }

# Re-extraction with corrected data
PUT /document/{id}/output/invoice  { "vendor": "Acme Corp", "total": 1234.56 }

# GET returns the latest
GET /document/{id}/output/invoice  { "vendor": "Acme Corp", "total": 1234.56 }
The R2 blob is also updated, so public reads always serve the latest materialization.