Same document, three URLs
OkraPDF’s redaction engine runs server-side at the edge. PII is removed before the response leaves the Worker — it never reaches the browser.How it works
- Parse your PDF with any vendor (LlamaParse, Docling, Unstructured, Azure Doc Intel)
- Deploy with
@okrapdf/edge-kit— pass pages + redaction config - Get back 3 URLs — admin, viewer, public
PII detection with OpenRedaction
Pass apii config object and the SDK uses OpenRedaction under the hood — compliance presets, name/address detection, context-aware matching, and 400+ pattern types out of the box. No pii field uses OpenRedaction defaults (all patterns enabled).
Custom patterns
For domain-specific patterns, passcustomPatterns — raw regex alongside presets:
Config-per-document
Each document gets its own redaction config. No global settings to manage.Vendor-agnostic
ThePageInput format works with any parser:
Every access path, not just URLs
Redaction isn’t just for static markdown URLs. The same lens applies to:- Completions endpoint — agent tool results are redacted before the LLM sees them. The model can’t leak PII it never received.
- Agent SQL queries —
query_sqlresults pass through the lens. ASELECT * FROM nodesreturns[REDACTED]for PII fields. - Text search — search results are filtered through the active role. Searching for a raw SSN against a viewer token returns zero matches.
Architecture
- Redaction runs on Cloudflare Workers — sub-5ms, no cold starts
- Pages stored in R2 — zero egress fees
- HMAC-signed tokens — no database lookup needed to verify
- Markdown output —
Content-Type: text/markdown, no HTML rendering overhead