TL;DR
Keep the public API document-first:documentId= durable DO identityrunId= one processing execution (workflowIdalias)
Current truth (from agent-session today)
What exists now:metatable stores document-level state likephase,active_workflow_id.document_logis append-only (seqautoincrement), used for workflow and audit events.- Upload responses already return
workflowId. - There is no first-class
runstable yet. /document/:id/statusis document-scoped (not run-scoped).
Lifecycle model
- Document: durable workspace entity for one PDF.
- Run: one extraction/reparse execution for that document.
- Event stream: append-only document log (
document_log.seq) for cursoring.
Proposed HTTP API
Document endpoints
POST /v1/documentsGET /v1/documents/:documentIdGET /v1/documents/:documentId/events?after=<seq>&limit=<n>POST /v1/documents/:documentId/share-linksPOST /v1/documents/:documentId/publish
Run endpoints
GET /v1/documents/:documentId/runsGET /v1/documents/:documentId/runs/:runIdPOST /v1/documents/:documentId/runs(start upload/reparse run)POST /v1/documents/:documentId/runs/:runId/cancel
Completion endpoints
POST /v1/documents/:documentId/responsesPOST /v1/documents/:documentId/responses:streamPOST /v1/shares/:shareId/responses(redaction/permission constrained)
Existing route compatibility
Keep/document/:id/* as compatibility routes; internally map:
/document/:id/status->/v1/documents/:id/document/:id/completion->/v1/documents/:id/responses/document/:id/reparse->POST /v1/documents/:id/runs
HTTP diff (current -> proposed)
Is this a lot of change?
Not really. This is mostly additive:- Keep old
/document/:id/*routes as aliases. - Add explicit
/runsand/eventsresources. - Rename completion endpoint to
/responsesfor standard agent/client conventions. - Return
runIdalongsideworkflowIdduring transition.
Response shapes
Document
Run
Response
Data model plan
Phase 1 (no schema migration)
Build/runs from:
meta.active_workflow_id(active run)document_logworkflow events (workflow_complete,workflow_error)- upload/reparse lifecycle response metadata where available
Phase 2 (recommended)
Add first-classruns table:
start/progress/complete/error), upsert runs.
Cursoring model (Partykit-style lifecycle)
Usedocument_log.seq as cursor:
- client stores
lastSeq - requests
GET /events?after=<lastSeq> - receives ordered events
- updates cursor
Implementation task list
- Add
/v1/documents/:documentId/runsand/v1/documents/:documentId/runs/:runId. - Add
/v1/documents/:documentId/eventswithaftercursor overdocument_log.seq. - Add
/v1/documents/:documentId/responsesand:stream. - Add compatibility mappings from existing
/documentroutes. - Add contract tests: document fetch, runs list, events cursor, completion alias parity.
- Add schema migration for
runstable (phase 2). - Update SDK docs/examples to prefer document-first terminology.