Why this exists
We treat a document as a durable, workspace-like entity. Confusion appears when upload is interpreted as returning a long-lived “job object”. In reality:documentIdis the durable identity.workflowIdis a processing run identity for that upload/reparse execution.
Canonical terminology
- Document: Permanent identity (
documentId) and storage scope. - Run: One processing execution attached to a document (
workflowId/ futurerunId). - Session: SDK handle bound to one document (
okra.sessions.from/create).
Current contract (today)
POST /document/:id/upload returns document-scoped metadata:
documentIdphasestatusworkflowId(current run)urls
GET /document/:id/status is document-scoped and reports current/latest state.
Target contract (design)
Document remains primary. Runs become explicit sub-resources.Document endpoints
GET /document/:id/statusPOST /document/:id/completionPOST /document/:id/share-linkPOST /document/:id/publish
Run endpoints (additive)
GET /document/:id/runsGET /document/:id/runs/:runIdPOST /document/:id/runs/:runId/cancel(or/cancelon active run)
Run record shape (minimal)
runIddocumentIdphasestartedAtupdatedAtcompletedAt(nullable)error(nullable)
/status. Clients can derive “last successful run” from /runs.
Change magnitude (current -> target)
This is a small-to-medium additive change:- Keep existing
/document/:id/*routes working. - Add explicit
/runsand/eventsresources for lifecycle visibility. - Introduce
/responsesnaming for completion while keeping/completionalias during migration. - Keep
workflowIdwhile addingrunIdfor naming convergence.
Design rules
- Never change
documentIdacross reparse/retry. - Every upload/reparse creates a new run record.
- Completion/share/publish are document-scoped by default.
- Deterministic export should support run/version pinning.
- Keep backward compatibility with additive fields first, then deprecate.
Conflicts to erase
- Avoid “job object” phrasing in SDK docs.
- Standardize on “documentId” in examples.
- Mark
documents[].jobIdin deploy payload as legacy naming alias. - Use “run/workflow” language for execution state.
Implementation task list
P0: Terminology + docs (immediate)
- Update SDK docs to define document vs run explicitly.
- Update cookbook upload examples to capture
workflowId. - Mark multi-doc deploy
documents[].jobIdas legacy naming in docs. - Add this page to SDK navigation.
P1: API shape hardening (additive)
- Persist run ledger per document.
- Expose
GET /document/:id/runsandGET /document/:id/runs/:runId. - Keep
/statusdocument-scoped; do not add derived run rollups. - Include
runId/workflowIdconsistently in lifecycle responses.
P2: SDK improvements
- Add typed run metadata on
session.status(). - Add optional run-aware helpers for diagnostics (without making runs primary).
- Keep session-first ergonomics unchanged for default usage.
P3: Naming convergence
- Accept both
documents[].jobIdanddocuments[].documentIdon deploy. - Prefer
documentIdin responses/docs. - Mark
jobIdrequest field deprecated with sunset date.
P4: Validation + rollout
- Add contract tests for upload/status/run fields.
- Add migration tests for old clients (no breakage).
- Add telemetry dashboards: run success rate, retries, phase duration.
- Publish migration note in changelog and SDK docs.
Definition of done
- Public docs no longer imply “job object” as primary identity.
- API exposes first-class run history.
- SDK remains small/session-first while still enabling run diagnostics.
- Deploy payload accepts modern naming (
documentId) and handles legacy alias.