Skip to main content

Overview

Deploy a document with redaction config and get back 3 URLs — same underlying document, different permission levels. Server-side redaction means PII never reaches the browser.
view.okrapdf.com/s/{admin-token}/fw9.md   → full text, all PII visible
view.okrapdf.com/s/{viewer-token}/fw9.md  → PII replaced with [EMAIL], [PHONE], ***-**-****
view.okrapdf.com/s/{public-token}/fw9.md  → only allowlisted sections, PII redacted
Each URL is an HMAC-signed token that encodes the document ID and role. The filename (fw9.md) is decorative — the token is the auth.

Install

npm install @okrapdf/edge-kit

End-to-end example

Parse a PDF with LlamaParse, then deploy with redaction:
import { LlamaCloud } from '@llamaindex/llama-cloud';
import { deploy } from '@okrapdf/edge-kit';
import type { PageInput } from '@okrapdf/edge-kit';

// 1. Parse PDF via LlamaParse (or any vendor)
const client = new LlamaCloud({ apiKey: process.env.LLAMAINDEX_API_KEY });
const parseResult = await client.parsing.parse({
  source_url: 'https://www.irs.gov/pub/irs-pdf/fw9.pdf',
  tier: 'cost_effective',
  version: 'latest',
  expand: ['items', 'markdown'],
}, { verbose: true });

// 2. Convert vendor output → vendor-agnostic PageInput
const pages: PageInput[] = [];
for (const page of parseResult.markdown?.pages ?? []) {
  if (!('markdown' in page)) continue;
  pages.push({ pageNum: page.page_number, text: page.markdown });
}

// 3. Configure PII detection
const pii = {
  preset: 'hipaa',
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

// 4. Deploy with redaction config → get 3 URLs
const result = await deploy({
  pages,
  meta: { title: 'IRS W-9 (Rev. 3-2024)', filename: 'fw9.pdf' },
  redact: {
    pii,
    publicFieldAllowlist: ['Form W-9', 'Part I', 'Part II', 'General Instructions'],
  },
  apiKey: process.env.OKRA_API_KEY!,
});

console.log(result.urls.admin);   // full text
console.log(result.urls.viewer);  // PII redacted
console.log(result.urls.public);  // allowlist only
console.log(result.stats);        // { totalMatches: 5, pagesAffected: 2, byRule: { SSN: 1, EMAIL: 2, PHONE_US: 2 } }

What gets redacted

The pii config uses OpenRedaction — compliance presets, name/address detection, and 400+ pattern types. Pick a preset or list specific patterns:
// Preset-based (HIPAA, GDPR, CCPA)
const pii = { preset: 'hipaa', includeNames: true };

// Pattern-based
const pii = { patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'] };

// Combined
const pii = { preset: 'hipaa', patterns: ['TAX_ID_US'], includeAddresses: true };
No pii field? Uses OpenRedaction defaults (all patterns enabled).

Three roles

RoleWhat they seeUse case
adminFull text, all PII visibleInternal review, compliance team
viewerPII replaced with placeholdersExternal auditors, partners
publicOnly allowlisted sections, PII redactedPublic-facing links, embedding

Custom patterns

For domain-specific patterns, pass customPatterns with raw regex alongside presets:
const result = await deploy({
  pages,
  apiKey: process.env.OKRA_API_KEY!,
  redact: {
    pii: {
      preset: 'hipaa',
      customPatterns: [
        { type: 'ACCOUNT_NUM', regex: /ACC-\d{8}/g, priority: 10, placeholder: '[ACCOUNT_{n}]', severity: 'high' },
        { type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
      ],
    },
    publicFieldAllowlist: ['Summary', 'Terms'],
  },
});

URL anatomy

view.okrapdf.com / s / {token} / {filename}.md
                   │    │          │
                   │    │          └─ decorative (human-readable, not used for lookup)
                   │    └─ HMAC-signed: base64(docId:role).signature
                   └─ "shared/governed" route prefix
The token is verified server-side with HMAC-SHA256. Tampering with the role or document ID invalidates the signature.

Response format

URLs return Content-Type: text/markdown; charset=utf-8. The response is the document’s markdown with redaction already applied — no client-side processing needed.
curl https://view.okrapdf.com/s/{viewer-token}/fw9.md

# Form W-9
# Request for Taxpayer Identification Number
Name: John Doe
SSN: ***-**-****
Email: [EMAIL]
Phone: [PHONE]

Redaction applies everywhere

Static URLs are just the beginning. The same redaction lens applies to every access path — completions, agent SQL queries, and text search. The LLM never sees raw PII.

Completions endpoint

When a consumer hits the public /completion endpoint, the agent’s tool results are redacted before the LLM sees them:
POST /v1/documents/fw9-a3f8b2/completion
{ "prompt": "Who filed this W-9?" }
The response only contains redacted content — the model literally cannot leak PII because it never received it.

Agent SQL queries

The DocumentAgent has a query_sql tool that runs SELECT queries against the document’s local SQLite. Redaction intercepts the tool result before it’s fed back to the LLM:
// 1. The LLM decides to call query_sql
const sqlQuery = llmResponse.tool_calls[0].function.arguments.query;

// 2. The DO runs the query against the RAW data
const rawResult = await this.state.storage.sql(sqlQuery);
// => [{ name: "John Doe", ssn: "123-45-6789", email: "john@acme.com" }]

// 3. Apply the redaction lens to the tool result
const safeResult = rawResult.map(row => {
  let safe = { ...row };
  if (activeLens !== 'admin') {
    if (safe.ssn) safe.ssn = '[REDACTED]';
    if (safe.email) safe.email = '[EMAIL]';
    if (safe.phone) safe.phone = '[PHONE]';
  }
  return safe;
});

// 4. Return the SAFE result back to the LLM as a tool message
messages.push({
  role: 'tool',
  tool_call_id: tool.id,
  content: JSON.stringify(safeResult),
  // The LLM only sees: [{ name: "John Doe", ssn: "[REDACTED]", email: "[EMAIL]" }]
});
The LLM reasons over redacted data. It can still answer “who filed this?” (name wasn’t redacted) but cannot surface the SSN or email. Full-text search results go through the same lens. A search for “123-45” against a viewer-role token returns zero matches — the redacted content doesn’t contain the raw pattern.

Why this matters

Most redaction tools only protect static exports. OkraPDF redacts at the data access layer — every SELECT, every completion, every search result passes through the lens. The blast radius of a leaked token is bounded by its role, not by which endpoint was called.

Local extraction with Docling

Want to parse PDFs locally so your document bytes never touch a third-party cloud? See the Local Extraction + Redaction (Docling) cookbook.