Skip to main content

RAG Was a Workaround

Retrieval-Augmented Generation solved a real problem: LLMs have limited context windows, so we chunk documents, embed them, and retrieve relevant pieces at query time. But RAG has fundamental limitations:
Tables split across chunks. Context gets lost. The embedding of row 47 doesn’t know about the header.
Semantic similarity isn’t the same as relevance. The most similar chunk isn’t always the most useful one.
RAG retrieves, then generates. It can’t iteratively explore a document like a human would.
Vector databases, embedding models, retrieval pipelines, re-ranking—all to approximate what “reading” should be.

The New Approach: Let Agents Read

Modern AI agents don’t need retrieval pipelines. They need files they can actually read. Give an agent:
  • A filesystem with your documents
  • Tools to search and navigate
  • A code interpreter
And it can do what RAG tried to do—but better, because it can reason about what to read next.

The Catch: Format Matters

Agents read plaintext natively. But your documents aren’t plaintext:
FormatAgent Can Read?
.md, .txt, .jsonYes
.py, .tsx, .sqlYes
.pdfNo
.docx, .xlsxNo
Scanned documentsNo
This is where OkraPDF comes in.

OkraPDF: The Bridge

We convert documents from formats agents can’t read into formats they can:
PDF (opaque binary)
    ↓ OkraPDF
Structured text + tables + figures (agent-ready)
But we don’t stop at parsing. We give agents tools to work with the data:

Parse

OCR, table extraction, figure detection. Your PDF becomes structured data.

Search

Semantic search across extracted entities. Find the right table in seconds.

Chat

Ask questions. The agent has full context, not retrieved chunks.

Query

Agent can run SQL queries against your document’s structured data, not just retrieve chunks.

Side-by-Side Comparison

CapabilityRAG PipelineOkraPDF
Setup timeDays to weeksMinutes
InfrastructureVector DB + embeddings + APINone (hosted)
Table handlingChunks and hopesStructure preserved
Multi-step reasoningLimitedFull agent loop
SQL query accessNoYes (per-document SQLite)
Accuracy verificationTrust the retrievalSide-by-side review

When to Use What

Use RAG when:
  • You have millions of documents
  • Latency is critical (sub-second)
  • Queries are simple lookups
Use OkraPDF when:
  • Document structure matters (tables, figures)
  • You need to verify extraction accuracy
  • Queries require reasoning across the document
  • You want agents to compute on data, not just retrieve it

Try It

1

Upload a PDF

Any document—financial report, research paper, invoice
2

See the extraction

Tables, figures, text—all preserved and searchable
3

Ask a question

Document Chat gives you answers grounded in actual data

Get Started Free

50 pages free. No credit card required.