Skip to main content

The Core Insight

Don’t build agents that query your data. Build agents that run where your data lives. For documents, this means: agent code runs in the same process boundary as the data. No remote calls to fetch pages. No separate storage service. Data and computation are one unit.

Traditional vs Colocated

Traditional Stack
  • Agent is external service
  • Agent makes calls to storage service for document
  • Agent makes calls to database for metadata
  • Agent and data are separate systems
OkraPDF Colocation
  • Agent runs in same boundary as document
  • Agent reads local state directly
  • Agent has full context immediately available
  • No intermediate services between agent and data

How It Works

When user uploads PDF:
PDF → DocumentAgent DO
  ├─ Local SQLite (extracted pages, metadata, entities)
  ├─ Agent code runs here
  └─ Agent reads local state → Calls LLM API directly
When user asks question:
  1. Agent reads pages from local storage
  2. Agent constructs context from local state
  3. Agent calls LLM API with complete context
  4. PDF stays within DO boundary

Mixed Data Sources

Agents rarely analyze only PDFs. They combine:
  • Local (colocated): PDF content, tables, entities, chat history
  • Remote (on-demand): Database queries, API calls, external tools
Agent has primary context (PDF) colocated. Secondary sources accessed as-needed. No separation between where data lives and where computation happens.

Why This Matters

A PDF isn’t just bytes. It’s:
  • Extracted text and tables
  • Entity positions and confidence scores
  • User’s previous questions and chat history
  • Processing metadata (OCR status, vendor used, etc.)
Traditional stacks split this across services:
  • Storage service stores bytes
  • Database stores metadata
  • Agent queries both separately
Colocation: all in one boundary. The agent has everything locally.

The Trade-offs

What you gain:
  • Agent owns the full PDF context
  • Agent runs in same boundary as data
  • Agent is the primary interface to document
  • Simpler app code (no service orchestration)
What you give up:
  • Each document gets its own agent instance
  • Agent state is isolated per PDF

Common Questions

Can agents access other documents? Yes. Agent in DO A can call DO B. But the common case is: agent has everything it needs locally. Does this scale? Yes. Each document gets one agent instance. Scaling documents automatically scales agents. Is this about performance? Not primarily. Performance is a side effect. The real point: agent IS the interface to the document. They shouldn’t be separate.