Document DO Lifecycle HTTP API Design

TL;DR

Keep the public API document-first:

documentId = durable DO identity
runId = one processing execution (workflowId alias)

The “room” concept is only a lifecycle metaphor, not a public resource name.

Current truth (from agent-session today)

What exists now:

meta table stores document-level state like phase, active_workflow_id.
document_log is append-only (seq autoincrement), used for workflow and audit events.
Upload responses already return workflowId.
There is no first-class runs table yet.
/document/:id/status is document-scoped (not run-scoped).

This design is additive and does not assume a runs table already exists.

Lifecycle model

Document: durable workspace entity for one PDF.
Run: one extraction/reparse execution for that document.
Event stream: append-only document log (document_log.seq) for cursoring.

Proposed HTTP API

Document endpoints

POST /v1/documents
GET /v1/documents/:documentId
GET /v1/documents/:documentId/events?after=<seq>&limit=<n>
POST /v1/documents/:documentId/share-links
POST /v1/documents/:documentId/publish

Run endpoints

GET /v1/documents/:documentId/runs
GET /v1/documents/:documentId/runs/:runId
POST /v1/documents/:documentId/runs (start upload/reparse run)
POST /v1/documents/:documentId/runs/:runId/cancel

Completion endpoints

POST /v1/documents/:documentId/responses
POST /v1/documents/:documentId/responses:stream
POST /v1/shares/:shareId/responses (redaction/permission constrained)

Existing route compatibility

Keep /document/:id/* as compatibility routes; internally map:

/document/:id/status -> /v1/documents/:id
/document/:id/completion -> /v1/documents/:id/responses
/document/:id/reparse -> POST /v1/documents/:id/runs

HTTP diff (current -> proposed)

# Lifecycle
- POST /document/:id/upload
+ POST /v1/documents/:id/runs

- GET /document/:id/status
+ GET /v1/documents/:id

+ GET /v1/documents/:id/runs
+ GET /v1/documents/:id/runs/:runId
+ GET /v1/documents/:id/events?after=<seq>&limit=<n>

# Completion
- POST /document/:id/completion
+ POST /v1/documents/:id/responses
+ POST /v1/documents/:id/responses:stream
+ POST /v1/shares/:shareId/responses

Is this a lot of change?

Not really. This is mostly additive:

Keep old /document/:id/* routes as aliases.
Add explicit /runs and /events resources.
Rename completion endpoint to /responses for standard agent/client conventions.
Return runId alongside workflowId during transition.

Response shapes

Document

{
  "documentId": "ocr_...",
  "phase": "complete",
  "activeRunId": "lifecycle-ocr_...-1700000000000",
  "updatedAt": 1700000000000
}

Run

{
  "runId": "lifecycle-ocr_...-1700000000000",
  "documentId": "ocr_...",
  "phase": "complete",
  "startedAt": 1700000000000,
  "updatedAt": 1700000012345,
  "completedAt": 1700000012345,
  "error": null
}

Clients can derive “last successful run” from sorted runs; server does not need a special field.

Response

{
  "documentId": "ocr_...",
  "runIdUsed": "lifecycle-ocr_...-1700000000000",
  "answer": "Total revenue is ...",
  "citations": [],
  "costUsd": 0.0012
}

Data model plan

Phase 1 (no schema migration)

Build /runs from:

meta.active_workflow_id (active run)
document_log workflow events (workflow_complete, workflow_error)
upload/reparse lifecycle response metadata where available

This is best-effort historical coverage.

Phase 2 (recommended)

Add first-class runs table:

CREATE TABLE runs (
  run_id TEXT PRIMARY KEY,
  document_id TEXT NOT NULL,
  phase TEXT NOT NULL,
  started_at INTEGER NOT NULL,
  updated_at INTEGER NOT NULL,
  completed_at INTEGER,
  error TEXT
);
CREATE INDEX idx_runs_document ON runs(document_id, started_at DESC);

On workflow lifecycle hooks (start/progress/complete/error), upsert runs.

Cursoring model (Partykit-style lifecycle)

Use document_log.seq as cursor:

client stores lastSeq
requests GET /events?after=<lastSeq>
receives ordered events
updates cursor

This mirrors durable-entity event streaming patterns used in collaborative systems.

Implementation task list

Add /v1/documents/:documentId/runs and /v1/documents/:documentId/runs/:runId.
Add /v1/documents/:documentId/events with after cursor over document_log.seq.
Add /v1/documents/:documentId/responses and :stream.
Add compatibility mappings from existing /document routes.
Add contract tests: document fetch, runs list, events cursor, completion alias parity.
Add schema migration for runs table (phase 2).
Update SDK docs/examples to prefer document-first terminology.

Getting Started

Cookbook

Document DO Lifecycle HTTP API Design

TL;DR

Current truth (from agent-session today)

Lifecycle model

Proposed HTTP API

Document endpoints

Run endpoints

Completion endpoints

Existing route compatibility

HTTP diff (current -> proposed)

Is this a lot of change?

Response shapes

Document

Run

Response

Data model plan

Phase 1 (no schema migration)

Phase 2 (recommended)

Cursoring model (Partykit-style lifecycle)

Implementation task list

Getting Started

Cookbook

​TL;DR

​Current truth (from agent-session today)

​Lifecycle model

​Proposed HTTP API

​Document endpoints

​Run endpoints

​Completion endpoints

​Existing route compatibility

​HTTP diff (current -> proposed)

​Is this a lot of change?

​Response shapes

​Document

​Run

​Response

​Data model plan

​Phase 1 (no schema migration)

​Phase 2 (recommended)

​Cursoring model (Partykit-style lifecycle)

​Implementation task list

TL;DR

Current truth (from agent-session today)

Lifecycle model

Proposed HTTP API

Document endpoints

Run endpoints

Completion endpoints

Existing route compatibility

HTTP diff (current -> proposed)

Is this a lot of change?

Response shapes

Document

Run

Response

Data model plan

Phase 1 (no schema migration)

Phase 2 (recommended)

Cursoring model (Partykit-style lifecycle)

Implementation task list