Overview
This walkthrough shows the full output schema lifecycle using curl. Five steps: register the recipe, write the result, read from the DO, inspect the audit trail, and read publicly from R2.
All write operations require the x-document-agent-secret header. Public R2 reads require no authentication.
Step 1: Register an Output Profile
Define the extraction recipe — what to extract, how to extract it, which model to use.
curl -X PUT "https://api.okrapdf.com/document/{doc_id}/output-profile/invoice" \
-H "x-document-agent-secret: $SECRET" \
-H "Content-Type: application/json" \
-d '{
"schema": {
"type": "object",
"properties": {
"vendor": { "type": "string" },
"total": { "type": "number" },
"date": { "type": "string", "format": "date" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"amount": { "type": "number" }
}
}
}
}
},
"prompt": "Extract invoice fields including vendor, date, total, and line items.",
"model": "claude-sonnet-4-5-20250929"
}'
The profile is stored in the document’s Durable Object SQLite database. It’s the recipe — no extraction runs yet.
Step 2: Materialize the Output
After your SDK or agent runs the extraction against the LLM, write the validated result and audit trail.
curl -X PUT "https://api.okrapdf.com/document/{doc_id}/output/invoice" \
-H "x-document-agent-secret: $SECRET" \
-H "Content-Type: application/json" \
-d '{
"data": {
"vendor": "Acme Corp",
"total": 1234.56,
"date": "2026-01-15",
"line_items": [
{ "description": "Widget", "quantity": 10, "amount": 1234.56 }
]
},
"audit": {
"model": "claude-sonnet-4-5-20250929",
"prompt": "Extract invoice fields including vendor, date, total, and line items.",
"raw_response": "{\"vendor\":\"Acme Corp\",\"total\":1234.56,\"date\":\"2026-01-15\",\"line_items\":[{\"description\":\"Widget\",\"quantity\":10,\"amount\":1234.56}]}"
}
}'
This does two things:
- UPSERT into DO SQLite — source of truth with full audit trail
- PUT to R2 —
public/{doc_id}/o_invoice.json with data only (no audit in public blob)
Step 3: Read via Durable Object
Authenticated read from the DO’s SQLite. Returns the validated data.
curl "https://api.okrapdf.com/document/{doc_id}/output/invoice" \
-H "x-document-agent-secret: $SECRET"
{
"vendor": "Acme Corp",
"total": 1234.56,
"date": "2026-01-15",
"line_items": [
{ "description": "Widget", "quantity": 10, "amount": 1234.56 }
]
}
Step 4: Inspect the Audit Trail
See exactly what produced the output: which model, what prompt, the raw LLM response before parsing.
curl "https://api.okrapdf.com/document/{doc_id}/output/invoice/audit" \
-H "x-document-agent-secret: $SECRET"
{
"model": "claude-sonnet-4-5-20250929",
"prompt": "Extract invoice fields including vendor, date, total, and line items.",
"raw_response": "{\"vendor\":\"Acme Corp\",\"total\":1234.56,...}",
"created_at": 1772173501590
}
The audit trail is only available via the authenticated DO path. It is never exposed in the public R2 blob.
Step 5: Public R2 Read
The key benefit. No API key. No Durable Object wake. Served straight from R2 with cache headers.
curl "https://api.okrapdf.com/v1/documents/{doc_id}/o_invoice/data.json"
{
"vendor": "Acme Corp",
"total": 1234.56,
"date": "2026-01-15",
"line_items": [
{ "description": "Widget", "quantity": 10, "amount": 1234.56 }
]
}
Response headers:
Cache-Control: public, max-age=3600
Access-Control-Allow-Origin: *
Content-Type: application/json
Embed this URL in dashboards, spreadsheets, or downstream pipelines. It’s a static JSON file.
The o_ prefix works alongside the existing t_ transform prefix:
# Extract via LlamaParse provider, read invoice output
curl "https://api.okrapdf.com/v1/documents/{doc_id}/t_llamaparse/o_invoice/data.json"
Multiple Output Schemas
A single document can have many output schemas. Each is independent.
# Register and materialize different extractions
PUT /document/{id}/output-profile/invoice
PUT /document/{id}/output-profile/compliance
PUT /document/{id}/output-profile/summary
# Read each independently
GET /v1/documents/{id}/o_invoice/data.json
GET /v1/documents/{id}/o_compliance/data.json
GET /v1/documents/{id}/o_summary/data.json
Upsert Behavior
Both profile registration and output materialization use upsert semantics. Re-running an extraction with updated data overwrites the previous result:
# First extraction
PUT /document/{id}/output/invoice → { "vendor": "Acme Corp", "total": 1000 }
# Re-extraction with corrected data
PUT /document/{id}/output/invoice → { "vendor": "Acme Corp", "total": 1234.56 }
# GET returns the latest
GET /document/{id}/output/invoice → { "vendor": "Acme Corp", "total": 1234.56 }
The R2 blob is also updated, so public reads always serve the latest materialization.