Extract

Upload, process, and extract from a PDF in one command.

# From a local file
okra extract ./report.pdf

# From a URL
okra extract https://example.com/report.pdf

# Text only
okra extract ./report.pdf --text-only

# Tables only
okra extract ./report.pdf --tables-only

# JSON output (for piping)
okra extract ./report.pdf -o json

# Use an extraction template
okra extract ./report.pdf --template invoice

# Write agentic workspace to a directory
okra extract ./report.pdf -d ./workspace

# Include page images and figure crops
okra extract ./report.pdf -d ./workspace --images

Flags

Flag	Description
`-o, --output <format>`	Output format: `table`, `json`, `markdown`
`-d, --output-dir <path>`	Write agentic workspace to directory
`--images`	Include page images and figure crops in workspace
`--scale <n>`	Image scale factor (1–4)
`-t, --template <name>`	Template: `invoice`, `receipt`, `financial-statement`
`--processor <name>`	Processor hint: `docai`, `gemini`, `qwen`, `llamaparse`
`--ocr <engine>`	OCR engine: `docai`, `tesseract`, `textract`, `azure-read`
`--vlm <model>`	VLM model (e.g. `google/gemini-2.5-flash-preview-09-2025`)
`--tables-only`	Only return extracted tables
`--text-only`	Only return extracted text
`--timeout <seconds>`	Job timeout (default: 600)
`--wait-for <stage>`	Wait for pipeline stage: `ocr`, `entities`, `index`
`-p, --prompt <task>`	Add task context for piping to agent CLIs
`-q, --quiet`	Minimal output (ideal for piping)

Getting Started

Commands

Cookbook

Flags

Getting Started

Commands

Cookbook

​Flags

Flags