Skip to main content

Same document, three URLs

OkraPDF’s redaction engine runs server-side at the edge. PII is removed before the response leaves the Worker — it never reaches the browser.
/s/{admin-token}/fw9.md   → full text
/s/{viewer-token}/fw9.md  → SSN: ***-**-****, [EMAIL], [PHONE]
/s/{public-token}/fw9.md  → allowlisted sections only
Each URL is an HMAC-signed capability token. No API keys, no sessions, no cookies. The token IS the auth.

How it works

  1. Parse your PDF with any vendor (LlamaParse, Docling, Unstructured, Azure Doc Intel)
  2. Deploy with @okrapdf/edge-kit — pass pages + redaction config
  3. Get back 3 URLs — admin, viewer, public
import { deploy } from '@okrapdf/edge-kit';

const pii = {
  preset: 'hipaa',
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

const result = await deploy({
  pages,  // from any PDF parser
  meta: { title: 'W-9', filename: 'fw9.pdf' },
  redact: {
    pii,
    publicFieldAllowlist: ['Form W-9', 'General Instructions'],
  },
  apiKey: process.env.OKRA_API_KEY!,
});

result.urls.admin   // full text
result.urls.viewer  // PII redacted
result.urls.public  // allowlist + redacted

PII detection with OpenRedaction

Pass a pii config object and the SDK uses OpenRedaction under the hood — compliance presets, name/address detection, context-aware matching, and 400+ pattern types out of the box. No pii field uses OpenRedaction defaults (all patterns enabled).
const pii = {
  preset: 'hipaa',                                   // or 'gdpr', 'ccpa'
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

Custom patterns

For domain-specific patterns, pass customPatterns — raw regex alongside presets:
const pii = {
  preset: 'hipaa',
  customPatterns: [
    { type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
    { type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
  ],
};

Config-per-document

Each document gets its own redaction config. No global settings to manage.
// Tax forms: HIPAA preset, names + addresses
await deploy({
  pages: w9Pages,
  redact: {
    pii: { preset: 'hipaa', includeNames: true, includeAddresses: true },
    publicFieldAllowlist: ['Form W-9', 'Part I'],
  },
  apiKey,
});

// Contracts: custom patterns for deal values
await deploy({
  pages: contractPages,
  redact: {
    pii: {
      customPatterns: [
        { type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
      ],
    },
    publicFieldAllowlist: ['Terms', 'Parties'],
  },
  apiKey,
});

Vendor-agnostic

The PageInput format works with any parser:
interface PageInput {
  pageNum: number;
  text: string;
  items?: Array<{ text: string; bbox?: { x: number; y: number; w: number; h: number } }>;
}
No vendor lock-in. Parse with LlamaParse today, switch to Docling tomorrow — redaction works the same.

Every access path, not just URLs

Redaction isn’t just for static markdown URLs. The same lens applies to:
  • Completions endpoint — agent tool results are redacted before the LLM sees them. The model can’t leak PII it never received.
  • Agent SQL queriesquery_sql results pass through the lens. A SELECT * FROM nodes returns [REDACTED] for PII fields.
  • Text search — search results are filtered through the active role. Searching for a raw SSN against a viewer token returns zero matches.
See the Redact & Deploy cookbook for implementation details.

Architecture

  • Redaction runs on Cloudflare Workers — sub-5ms, no cold starts
  • Pages stored in R2 — zero egress fees
  • HMAC-signed tokens — no database lookup needed to verify
  • Markdown outputContent-Type: text/markdown, no HTML rendering overhead
See the Redact & Deploy cookbook for a full working example.