Document + Run Model

Why this exists

We treat a document as a durable, workspace-like entity. Confusion appears when upload is interpreted as returning a long-lived “job object”. In reality:

documentId is the durable identity.
workflowId is a processing run identity for that upload/reparse execution.

This page is the canonical model and follow-through plan. For the durable-entity lifecycle framing and concrete endpoint design, see Document DO Lifecycle HTTP API Design.

Canonical terminology

Document: Permanent identity (documentId) and storage scope.
Run: One processing execution attached to a document (workflowId / future runId).
Session: SDK handle bound to one document (okra.sessions.from/create).

Current contract (today)

POST /document/:id/upload returns document-scoped metadata:

documentId
phase
status
workflowId (current run)
urls

GET /document/:id/status is document-scoped and reports current/latest state.

Target contract (design)

Document remains primary. Runs become explicit sub-resources.

Document endpoints

GET /document/:id/status
POST /document/:id/completion
POST /document/:id/share-link
POST /document/:id/publish

Run endpoints (additive)

GET /document/:id/runs
GET /document/:id/runs/:runId
POST /document/:id/runs/:runId/cancel (or /cancel on active run)

Run record shape (minimal)

runId
documentId
phase
startedAt
updatedAt
completedAt (nullable)
error (nullable)

No derived rollups are required in /status. Clients can derive “last successful run” from /runs.

Change magnitude (current -> target)

This is a small-to-medium additive change:

Keep existing /document/:id/* routes working.
Add explicit /runs and /events resources for lifecycle visibility.
Introduce /responses naming for completion while keeping /completion alias during migration.
Keep workflowId while adding runId for naming convergence.

Endpoint diff details are documented in Document DO Lifecycle HTTP API Design.

Design rules

Never change documentId across reparse/retry.
Every upload/reparse creates a new run record.
Completion/share/publish are document-scoped by default.
Deterministic export should support run/version pinning.
Keep backward compatibility with additive fields first, then deprecate.

Conflicts to erase

Avoid “job object” phrasing in SDK docs.
Standardize on “documentId” in examples.
Mark documents[].jobId in deploy payload as legacy naming alias.
Use “run/workflow” language for execution state.

Implementation task list

P0: Terminology + docs (immediate)

Update SDK docs to define document vs run explicitly.
Update cookbook upload examples to capture workflowId.
Mark multi-doc deploy documents[].jobId as legacy naming in docs.
Add this page to SDK navigation.

P1: API shape hardening (additive)

Persist run ledger per document.
Expose GET /document/:id/runs and GET /document/:id/runs/:runId.
Keep /status document-scoped; do not add derived run rollups.
Include runId/workflowId consistently in lifecycle responses.

P2: SDK improvements

Add typed run metadata on session.status().
Add optional run-aware helpers for diagnostics (without making runs primary).
Keep session-first ergonomics unchanged for default usage.

P3: Naming convergence

Accept both documents[].jobId and documents[].documentId on deploy.
Prefer documentId in responses/docs.
Mark jobId request field deprecated with sunset date.

P4: Validation + rollout

Add contract tests for upload/status/run fields.
Add migration tests for old clients (no breakage).
Add telemetry dashboards: run success rate, retries, phase duration.
Publish migration note in changelog and SDK docs.

Definition of done

Public docs no longer imply “job object” as primary identity.
API exposes first-class run history.
SDK remains small/session-first while still enabling run diagnostics.
Deploy payload accepts modern naming (documentId) and handles legacy alias.

Getting Started

Cookbook

Document + Run Model

Why this exists

Canonical terminology

Current contract (today)

Target contract (design)

Document endpoints

Run endpoints (additive)

Run record shape (minimal)

Change magnitude (current -> target)

Design rules

Conflicts to erase

Implementation task list

P0: Terminology + docs (immediate)

P1: API shape hardening (additive)

P2: SDK improvements

P3: Naming convergence

P4: Validation + rollout

Definition of done

Getting Started

Cookbook

​Why this exists

​Canonical terminology

​Current contract (today)

​Target contract (design)

​Document endpoints

​Run endpoints (additive)

​Run record shape (minimal)

​Change magnitude (current -> target)

​Design rules

​Conflicts to erase

​Implementation task list

​P0: Terminology + docs (immediate)

​P1: API shape hardening (additive)

​P2: SDK improvements

​P3: Naming convergence

​P4: Validation + rollout

​Definition of done

Why this exists

Canonical terminology

Current contract (today)

Target contract (design)

Document endpoints

Run endpoints (additive)

Run record shape (minimal)

Change magnitude (current -> target)

Design rules

Conflicts to erase

Implementation task list

P0: Terminology + docs (immediate)

P1: API shape hardening (additive)

P2: SDK improvements

P3: Naming convergence

P4: Validation + rollout

Definition of done