Skip to main content

TL;DR

Keep the public API document-first:
  • documentId = durable DO identity
  • runId = one processing execution (workflowId alias)
The “room” concept is only a lifecycle metaphor, not a public resource name.

Current truth (from agent-session today)

What exists now:
  1. meta table stores document-level state like phase, active_workflow_id.
  2. document_log is append-only (seq autoincrement), used for workflow and audit events.
  3. Upload responses already return workflowId.
  4. There is no first-class runs table yet.
  5. /document/:id/status is document-scoped (not run-scoped).
This design is additive and does not assume a runs table already exists.

Lifecycle model

  1. Document: durable workspace entity for one PDF.
  2. Run: one extraction/reparse execution for that document.
  3. Event stream: append-only document log (document_log.seq) for cursoring.

Proposed HTTP API

Document endpoints

  1. POST /v1/documents
  2. GET /v1/documents/:documentId
  3. GET /v1/documents/:documentId/events?after=<seq>&limit=<n>
  4. POST /v1/documents/:documentId/share-links
  5. POST /v1/documents/:documentId/publish

Run endpoints

  1. GET /v1/documents/:documentId/runs
  2. GET /v1/documents/:documentId/runs/:runId
  3. POST /v1/documents/:documentId/runs (start upload/reparse run)
  4. POST /v1/documents/:documentId/runs/:runId/cancel

Completion endpoints

  1. POST /v1/documents/:documentId/responses
  2. POST /v1/documents/:documentId/responses:stream
  3. POST /v1/shares/:shareId/responses (redaction/permission constrained)

Existing route compatibility

Keep /document/:id/* as compatibility routes; internally map:
  • /document/:id/status -> /v1/documents/:id
  • /document/:id/completion -> /v1/documents/:id/responses
  • /document/:id/reparse -> POST /v1/documents/:id/runs

HTTP diff (current -> proposed)

# Lifecycle
- POST /document/:id/upload
+ POST /v1/documents/:id/runs

- GET /document/:id/status
+ GET /v1/documents/:id

+ GET /v1/documents/:id/runs
+ GET /v1/documents/:id/runs/:runId
+ GET /v1/documents/:id/events?after=<seq>&limit=<n>

# Completion
- POST /document/:id/completion
+ POST /v1/documents/:id/responses
+ POST /v1/documents/:id/responses:stream
+ POST /v1/shares/:shareId/responses

Is this a lot of change?

Not really. This is mostly additive:
  1. Keep old /document/:id/* routes as aliases.
  2. Add explicit /runs and /events resources.
  3. Rename completion endpoint to /responses for standard agent/client conventions.
  4. Return runId alongside workflowId during transition.

Response shapes

Document

{
  "documentId": "ocr_...",
  "phase": "complete",
  "activeRunId": "lifecycle-ocr_...-1700000000000",
  "updatedAt": 1700000000000
}

Run

{
  "runId": "lifecycle-ocr_...-1700000000000",
  "documentId": "ocr_...",
  "phase": "complete",
  "startedAt": 1700000000000,
  "updatedAt": 1700000012345,
  "completedAt": 1700000012345,
  "error": null
}
Clients can derive “last successful run” from sorted runs; server does not need a special field.

Response

{
  "documentId": "ocr_...",
  "runIdUsed": "lifecycle-ocr_...-1700000000000",
  "answer": "Total revenue is ...",
  "citations": [],
  "costUsd": 0.0012
}

Data model plan

Phase 1 (no schema migration)

Build /runs from:
  1. meta.active_workflow_id (active run)
  2. document_log workflow events (workflow_complete, workflow_error)
  3. upload/reparse lifecycle response metadata where available
This is best-effort historical coverage. Add first-class runs table:
CREATE TABLE runs (
  run_id TEXT PRIMARY KEY,
  document_id TEXT NOT NULL,
  phase TEXT NOT NULL,
  started_at INTEGER NOT NULL,
  updated_at INTEGER NOT NULL,
  completed_at INTEGER,
  error TEXT
);
CREATE INDEX idx_runs_document ON runs(document_id, started_at DESC);
On workflow lifecycle hooks (start/progress/complete/error), upsert runs.

Cursoring model (Partykit-style lifecycle)

Use document_log.seq as cursor:
  1. client stores lastSeq
  2. requests GET /events?after=<lastSeq>
  3. receives ordered events
  4. updates cursor
This mirrors durable-entity event streaming patterns used in collaborative systems.

Implementation task list

  1. Add /v1/documents/:documentId/runs and /v1/documents/:documentId/runs/:runId.
  2. Add /v1/documents/:documentId/events with after cursor over document_log.seq.
  3. Add /v1/documents/:documentId/responses and :stream.
  4. Add compatibility mappings from existing /document routes.
  5. Add contract tests: document fetch, runs list, events cursor, completion alias parity.
  6. Add schema migration for runs table (phase 2).
  7. Update SDK docs/examples to prefer document-first terminology.