Skip to main content
Upload, process, and extract from a PDF in one command.
# From a local file
okra extract ./report.pdf

# From a URL
okra extract https://example.com/report.pdf

# Text only
okra extract ./report.pdf --text-only

# Tables only
okra extract ./report.pdf --tables-only

# JSON output (for piping)
okra extract ./report.pdf -o json

# Use an extraction template
okra extract ./report.pdf --template invoice

# Write agentic workspace to a directory
okra extract ./report.pdf -d ./workspace

# Include page images and figure crops
okra extract ./report.pdf -d ./workspace --images

Flags

FlagDescription
-o, --output <format>Output format: table, json, markdown
-d, --output-dir <path>Write agentic workspace to directory
--imagesInclude page images and figure crops in workspace
--scale <n>Image scale factor (1–4)
-t, --template <name>Template: invoice, receipt, financial-statement
--processor <name>Processor hint: docai, gemini, qwen, llamaparse
--ocr <engine>OCR engine: docai, tesseract, textract, azure-read
--vlm <model>VLM model (e.g. google/gemini-2.5-flash-preview-09-2025)
--tables-onlyOnly return extracted tables
--text-onlyOnly return extracted text
--timeout <seconds>Job timeout (default: 600)
--wait-for <stage>Wait for pipeline stage: ocr, entities, index
-p, --prompt <task>Add task context for piping to agent CLIs
-q, --quietMinimal output (ideal for piping)