Redaction

Same document, three URLs

OkraPDF’s redaction engine runs server-side at the edge. PII is removed before the response leaves the Worker — it never reaches the browser.

/s/{admin-token}/fw9.md   → full text
/s/{viewer-token}/fw9.md  → SSN: ***-**-****, [EMAIL], [PHONE]
/s/{public-token}/fw9.md  → allowlisted sections only

Each URL is an HMAC-signed capability token. No API keys, no sessions, no cookies. The token IS the auth.

How it works

Parse your PDF with any vendor (LlamaParse, Docling, Unstructured, Azure Doc Intel)
Deploy with @okrapdf/edge-kit — pass pages + redaction config
Get back 3 URLs — admin, viewer, public

import { deploy } from '@okrapdf/edge-kit';

const pii = {
  preset: 'hipaa',
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

const result = await deploy({
  pages,  // from any PDF parser
  meta: { title: 'W-9', filename: 'fw9.pdf' },
  redact: {
    pii,
    publicFieldAllowlist: ['Form W-9', 'General Instructions'],
  },
  apiKey: process.env.OKRA_API_KEY!,
});

result.urls.admin   // full text
result.urls.viewer  // PII redacted
result.urls.public  // allowlist + redacted

PII detection with OpenRedaction

Pass a pii config object and the SDK uses OpenRedaction under the hood — compliance presets, name/address detection, context-aware matching, and 400+ pattern types out of the box. No pii field uses OpenRedaction defaults (all patterns enabled).

const pii = {
  preset: 'hipaa',                                   // or 'gdpr', 'ccpa'
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

Custom patterns

For domain-specific patterns, pass customPatterns — raw regex alongside presets:

const pii = {
  preset: 'hipaa',
  customPatterns: [
    { type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
    { type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
  ],
};

Config-per-document

Each document gets its own redaction config. No global settings to manage.

// Tax forms: HIPAA preset, names + addresses
await deploy({
  pages: w9Pages,
  redact: {
    pii: { preset: 'hipaa', includeNames: true, includeAddresses: true },
    publicFieldAllowlist: ['Form W-9', 'Part I'],
  },
  apiKey,
});

// Contracts: custom patterns for deal values
await deploy({
  pages: contractPages,
  redact: {
    pii: {
      customPatterns: [
        { type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
      ],
    },
    publicFieldAllowlist: ['Terms', 'Parties'],
  },
  apiKey,
});

Vendor-agnostic

The PageInput format works with any parser:

interface PageInput {
  pageNum: number;
  text: string;
  items?: Array<{ text: string; bbox?: { x: number; y: number; w: number; h: number } }>;
}

No vendor lock-in. Parse with LlamaParse today, switch to Docling tomorrow — redaction works the same.

Every access path, not just URLs

Redaction isn’t just for static markdown URLs. The same lens applies to:

Completions endpoint — agent tool results are redacted before the LLM sees them. The model can’t leak PII it never received.
Agent SQL queries — query_sql results pass through the lens. A SELECT * FROM nodes returns [REDACTED] for PII fields.
Text search — search results are filtered through the active role. Searching for a raw SSN against a viewer token returns zero matches.

See the Redact & Deploy cookbook for implementation details.

Architecture

Redaction runs on Cloudflare Workers — sub-5ms, no cold starts
Pages stored in R2 — zero egress fees
HMAC-signed tokens — no database lookup needed to verify
Markdown output — Content-Type: text/markdown, no HTML rendering overhead

See the Redact & Deploy cookbook for a full working example.

Features

Integrations

Resources

Same document, three URLs

How it works

PII detection with OpenRedaction

Custom patterns

Config-per-document

Vendor-agnostic

Every access path, not just URLs

Architecture

Features

Integrations

Resources

​Same document, three URLs

​How it works

​PII detection with OpenRedaction

​Custom patterns

​Config-per-document

​Vendor-agnostic

​Every access path, not just URLs

​Architecture

Same document, three URLs

How it works

PII detection with OpenRedaction

Custom patterns

Config-per-document

Vendor-agnostic

Every access path, not just URLs

Architecture