Skip to main content

Why this exists

We treat a document as a durable, workspace-like entity. Confusion appears when upload is interpreted as returning a long-lived “job object”. In reality:
  • documentId is the durable identity.
  • workflowId is a processing run identity for that upload/reparse execution.
This page is the canonical model and follow-through plan. For the durable-entity lifecycle framing and concrete endpoint design, see Document DO Lifecycle HTTP API Design.

Canonical terminology

  1. Document: Permanent identity (documentId) and storage scope.
  2. Run: One processing execution attached to a document (workflowId / future runId).
  3. Session: SDK handle bound to one document (okra.sessions.from/create).

Current contract (today)

POST /document/:id/upload returns document-scoped metadata:
  • documentId
  • phase
  • status
  • workflowId (current run)
  • urls
GET /document/:id/status is document-scoped and reports current/latest state.

Target contract (design)

Document remains primary. Runs become explicit sub-resources.

Document endpoints

  • GET /document/:id/status
  • POST /document/:id/completion
  • POST /document/:id/share-link
  • POST /document/:id/publish

Run endpoints (additive)

  • GET /document/:id/runs
  • GET /document/:id/runs/:runId
  • POST /document/:id/runs/:runId/cancel (or /cancel on active run)

Run record shape (minimal)

  • runId
  • documentId
  • phase
  • startedAt
  • updatedAt
  • completedAt (nullable)
  • error (nullable)
No derived rollups are required in /status. Clients can derive “last successful run” from /runs.

Change magnitude (current -> target)

This is a small-to-medium additive change:
  1. Keep existing /document/:id/* routes working.
  2. Add explicit /runs and /events resources for lifecycle visibility.
  3. Introduce /responses naming for completion while keeping /completion alias during migration.
  4. Keep workflowId while adding runId for naming convergence.
Endpoint diff details are documented in Document DO Lifecycle HTTP API Design.

Design rules

  1. Never change documentId across reparse/retry.
  2. Every upload/reparse creates a new run record.
  3. Completion/share/publish are document-scoped by default.
  4. Deterministic export should support run/version pinning.
  5. Keep backward compatibility with additive fields first, then deprecate.

Conflicts to erase

  1. Avoid “job object” phrasing in SDK docs.
  2. Standardize on “documentId” in examples.
  3. Mark documents[].jobId in deploy payload as legacy naming alias.
  4. Use “run/workflow” language for execution state.

Implementation task list

P0: Terminology + docs (immediate)

  1. Update SDK docs to define document vs run explicitly.
  2. Update cookbook upload examples to capture workflowId.
  3. Mark multi-doc deploy documents[].jobId as legacy naming in docs.
  4. Add this page to SDK navigation.

P1: API shape hardening (additive)

  1. Persist run ledger per document.
  2. Expose GET /document/:id/runs and GET /document/:id/runs/:runId.
  3. Keep /status document-scoped; do not add derived run rollups.
  4. Include runId/workflowId consistently in lifecycle responses.

P2: SDK improvements

  1. Add typed run metadata on session.status().
  2. Add optional run-aware helpers for diagnostics (without making runs primary).
  3. Keep session-first ergonomics unchanged for default usage.

P3: Naming convergence

  1. Accept both documents[].jobId and documents[].documentId on deploy.
  2. Prefer documentId in responses/docs.
  3. Mark jobId request field deprecated with sunset date.

P4: Validation + rollout

  1. Add contract tests for upload/status/run fields.
  2. Add migration tests for old clients (no breakage).
  3. Add telemetry dashboards: run success rate, retries, phase duration.
  4. Publish migration note in changelog and SDK docs.

Definition of done

  1. Public docs no longer imply “job object” as primary identity.
  2. API exposes first-class run history.
  3. SDK remains small/session-first while still enabling run diagnostics.
  4. Deploy payload accepts modern naming (documentId) and handles legacy alias.