Skip to main content

Goal

Run a reproducible curl-first test that shows:
  • admin completion sees raw sensitive values
  • viewer completion sees redacted placeholders
  • both use the exact same query prompt
This validates redaction at LLM context level, not just static rendering.

Prerequisites

export OKRA_API_KEY="okra_..."
export OKRA_BASE_URL="https://api.okrapdf.com" # optional

# tools: curl, jq, rg, python3
# python package for PDF generation:
python3 -m pip install reportlab

Step 1) Create a synthetic PII PDF

set -euo pipefail

BASE="${OKRA_BASE_URL:-https://api.okrapdf.com}"
DOC_ID="ocr-e2e-pii-$(date +%s)-$RANDOM"
PDF_PATH="/tmp/${DOC_ID}.pdf"

python3 - <<'PY' "$PDF_PATH"
import sys
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

pdf_path = sys.argv[1]
c = canvas.Canvas(pdf_path, pagesize=letter)
text = c.beginText(72, 720)
text.setFont("Helvetica", 12)
lines = [
    "PII Demo Intake Form",
    "Employee Name: Alice Example",
    "SSN: 111-22-3333",
    "Email: alice@example.com",
    "Phone: (415) 555-1212",
    "Department: Finance",
    "Notes: Contains sensitive identity fields for redaction testing.",
]
for line in lines:
    text.textLine(line)
c.drawText(text)
c.showPage()
c.save()
print(pdf_path)
PY

Step 2) Upload with redaction policy (capture workflow run)

UPLOAD_JSON="$(curl -sS -X POST "${BASE}/document/${DOC_ID}/upload" \
  -H "Authorization: Bearer ${OKRA_API_KEY}" \
  -H "Content-Type: application/pdf" \
  -H "X-File-Name: pii-demo.pdf" \
  -H "X-Redact: {\"pii\":{\"preset\":\"hipaa\",\"patterns\":[\"SSN\",\"EMAIL\",\"PHONE_US\"],\"includeNames\":true,\"includeAddresses\":true}}" \
  --data-binary "@${PDF_PATH}")"

echo "${UPLOAD_JSON}" | jq '{documentId, phase, status, workflowId, rootNodeId, urls}'
WORKFLOW_ID="$(echo "${UPLOAD_JSON}" | jq -r '.workflowId // empty')"
echo "WORKFLOW_ID=${WORKFLOW_ID}"

Document ID vs workflow run ID

  • DOC_ID is the durable document identity (/document/:id/...).
  • workflowId is the processing run identity for that upload/reparse.
  • One document can have multiple workflow runs over time; polling /status is document-scoped and reflects the current/latest phase.

Step 3) Wait until extraction is complete

for i in $(seq 1 240); do
  PHASE="$(curl -sS "${BASE}/document/${DOC_ID}/status" \
    -H "Authorization: Bearer ${OKRA_API_KEY}" | jq -r '.phase // empty')"

  if [ "${PHASE}" = "complete" ] || [ "${PHASE}" = "awaiting_review" ]; then
    break
  fi
  if [ "${PHASE}" = "error" ] || [ "${PHASE}" = "failed" ]; then
    echo "processing failed: ${PHASE}" >&2
    exit 1
  fi
  sleep 1
done

echo "DOC_ID=${DOC_ID}"
echo "PHASE=${PHASE}"

Step 4) Query admin and viewer with the exact same prompt

PROMPT='Extract and return EXACT values for these fields from the document: name, ssn, email, phone. Return strict JSON with keys name, ssn, email, phone. Do not summarize.'

ADMIN_ANSWER="$(curl -sS -X POST "${BASE}/document/${DOC_ID}/completion" \
  -H "Authorization: Bearer ${OKRA_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "X-Redaction-Role: admin" \
  -d "{\"prompt\":$(jq -Rn --arg p "$PROMPT" '$p')}" \
  | jq -r '.answer // ""')"

VIEWER_ANSWER="$(curl -sS -X POST "${BASE}/document/${DOC_ID}/completion" \
  -H "Authorization: Bearer ${OKRA_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "X-Redaction-Role: viewer" \
  -d "{\"prompt\":$(jq -Rn --arg p "$PROMPT" '$p')}" \
  | jq -r '.answer // ""')"

echo "--- ADMIN ANSWER ---"
printf '%s\n' "${ADMIN_ANSWER}"
echo "--- VIEWER ANSWER ---"
printf '%s\n' "${VIEWER_ANSWER}"

Step 5) Assert completion context redaction

ADMIN_HAS_SENSITIVE=0
VIEWER_HAS_SENSITIVE=0
VIEWER_HAS_REDACTED=0

printf '%s' "${ADMIN_ANSWER}" | rg -q '111-22-3333|alice@example.com|\(415\) 555-1212' && ADMIN_HAS_SENSITIVE=1 || true
printf '%s' "${VIEWER_ANSWER}" | rg -q '111-22-3333|alice@example.com|\(415\) 555-1212' && VIEWER_HAS_SENSITIVE=1 || true
printf '%s' "${VIEWER_ANSWER}" | rg -q '\[SSN_|\[EMAIL_|\[PHONE_US_|REDACTED' && VIEWER_HAS_REDACTED=1 || true

echo "ADMIN_HAS_SENSITIVE=${ADMIN_HAS_SENSITIVE}"
echo "VIEWER_HAS_SENSITIVE=${VIEWER_HAS_SENSITIVE}"
echo "VIEWER_HAS_REDACTED=${VIEWER_HAS_REDACTED}"

if [ "${ADMIN_HAS_SENSITIVE}" -eq 1 ] && [ "${VIEWER_HAS_SENSITIVE}" -eq 0 ] && [ "${VIEWER_HAS_REDACTED}" -eq 1 ]; then
  echo "ASSERT_PASS: completion uses role-based redaction context (admin full, viewer redacted)"
else
  echo "ASSERT_FAIL: unexpected completion redaction behavior" >&2
  exit 1
fi
VIEWER_MD="$(curl -sS -X POST "${BASE}/document/${DOC_ID}/share-link" \
  -H "Authorization: Bearer ${OKRA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"role":"viewer","expiresInMs":3600000}' \
  | jq -r '.links.markdown // .markdownUrl // ""')"

ADMIN_MD="$(curl -sS -X POST "${BASE}/document/${DOC_ID}/share-link" \
  -H "Authorization: Bearer ${OKRA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"role":"admin","expiresInMs":3600000}' \
  | jq -r '.links.markdown // .markdownUrl // ""')"

echo "VIEWER_MD=${VIEWER_MD}"
echo "ADMIN_MD=${ADMIN_MD}"

Notes

  • This cookbook uses authenticated completion with X-Redaction-Role.
  • It is the clearest way to verify role-aware redaction in LLM tool context.
  • The name field may remain visible depending on active policy; this test asserts SSN/email/phone behavior.
  • upload returns a document-scoped response (documentId, phase, workflowId, urls) rather than a separate long-lived job object.