Skip to main content

Overview

Instead of polling for job completion, you can register a webhook URL to receive a POST request when your job finishes.

Setup

Pass webhook_url when creating an extraction job:
curl -X POST https://app.okrapdf.com/api/v1/extract \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf",
    "webhook_url": "https://your-app.com/webhooks/okrapdf"
  }'

Webhook payload

When the job completes, OkraPDF sends a POST request to your URL:
{
  "event": "document.completed",
  "job_id": "ocr-abc123",
  "status": "completed",
  "filename": "report.pdf",
  "total_pages": 12,
  "results_url": "https://app.okrapdf.com/api/v1/jobs/ocr-abc123/results",
  "timestamp": "2025-01-15T10:30:00Z"
}

Webhook handler example

from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

@app.route("/webhooks/okrapdf", methods=["POST"])
def handle_webhook():
    payload = request.json
    job_id = payload["job_id"]

    if payload["status"] == "completed":
        # Fetch the results
        results = requests.get(
            f"https://app.okrapdf.com/api/v1/jobs/{job_id}/results",
            headers={"Authorization": f"Bearer {API_KEY}"},
        ).json()

        # Process tables...
        for table in results["results"]["tables"]:
            process_table(table)

    return jsonify({"received": True}), 200

Retry behavior

  • OkraPDF retries failed webhook deliveries up to 3 times
  • Retries use exponential backoff (5s, 30s, 5min)
  • Your endpoint must return a 2xx status to acknowledge receipt
  • Non-2xx or timeout (30s) triggers a retry

Best practices

  1. Return 200 quickly - process the webhook payload asynchronously
  2. Use HTTPS - webhook URLs must use HTTPS in production
  3. Verify the payload - check that the job_id exists in your system
  4. Handle duplicates - webhooks may be delivered more than once