Skip to main content

Overview

Branch creates a zero-copy fork of a document. Ingest with mode: "replace" supersedes existing nodes on affected pages. Together they let you correct extraction errors in isolation and compare results.

The problem

OCR textlayers sometimes get numbers wrong — dropped signs, lost precision. A completion grounded in bad extraction data gives wrong answers. You need a way to fix the data without mutating the original document.

Flow

Original doc ──→ branch ──→ ingest(replace) ──→ re-query
  (0.6)%, 14.7%                 14.8% (correct)    ✓ PASS

Full example

import { OkraClient } from 'okrapdf';

const client = new OkraClient({ apiKey: process.env.OKRA_API_KEY });
const docId = 'doc-abc123';

// 1. Ask the original — gets wrong answer from textlayer
const baseline = await client.generate(docId,
  'What was the effective tax rate in FY2022 vs FY2021?'
);
console.log(baseline.answer); // "(0.6)% and 14.7%" — wrong

// 2. Branch (zero-copy fork, ~2s)
const branch = await client.request('/v1/documents/' + docId + '/branch', {
  method: 'POST',
});
const branchId = branch.id;

// 3. Ingest corrected table data on the branch
await client.request('/document/' + branchId + '/ingest', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    vendor: 'canonical',
    mode: 'replace',        // supersedes existing nodes on affected pages
    data: {
      pages: [{
        pageNumber: 77,
        blocks: [{
          type: 'table',
          label: 'Tax Reconciliation',
          value: 'Income tax expense/(benefit) | $31 | (0.6)% | ($743) | 14.8%',
          children: [
            { type: 'row', children: [
              { type: 'cell', value: 'Income tax expense/(benefit)' },
              { type: 'cell', value: '$31' },
              { type: 'cell', value: '(0.6)%' },
              { type: 'cell', value: '($743)' },
              { type: 'cell', value: '14.8%' },
            ]}
          ]
        }]
      }]
    }
  }),
});

// 4. Wait for processing
await client.wait(branchId);

// 5. Re-query — gets correct answer
const improved = await client.generate(branchId,
  'What was the effective tax rate in FY2022 vs FY2021?'
);
console.log(improved.answer); // "(0.6)% and 14.8%" — correct

How mode: "replace" works

ModeBehavior
append (default)New nodes are added alongside existing ones
replaceExisting nodes on affected pages get status = 'superseded', then new nodes are hydrated. Completions only read non-superseded nodes.
Superseded nodes are not deleted — they stay in the graph for audit trail purposes. This follows the same append-only, never-overwrite pattern as vendor_log.

Branch response

{
  "id": "doc-forked-...",
  "branched_from": "doc-abc123",
  "phase": "complete",
  "row_counts": {
    "meta": 26,
    "nodes": 380,
    "edges": 379,
    "page_ledger": 190,
    "vendor_log": 12,
    "document_log": 419
  }
}
The branch is immediately queryable — same phase, same nodes. Mutations on the branch don’t affect the original.

When to use this

  • Eval corrections: fix extraction errors on specific pages to measure impact on downstream completions
  • A/B testing vendors: branch, re-ingest with a different vendor’s output, compare answers
  • Human-in-the-loop: reviewer corrects a table, ingests the fix on a branch, promotes if better
  • Safe experimentation: try schema changes or re-extraction without risking production data

Live demo

See the interactive version at ingest-branch-demo.pages.dev.