NVIDIA Parakeet/Canary ONNX transcription

Summarize can now run local transcription through NVIDIA's Parakeet-TDT 0.6B-v3 or Canary 1B-v2 ONNX exports by shelling out to a user-provided CLI. Auto selection tries Groq first when configured, then ONNX before whisper.cpp and the remaining cloud fallbacks. You can still select whisper.cpp or a specific ONNX model for the local stage.

#How to enable

Install a CLI capable of running the ONNX models (e.g. sherpa-onnx or a custom wrapper). Homebrew may not have a formula; use upstream binaries or build from source if needed. The CLI must emit the transcribed text on stdout and accept a single WAV input path. Summarize now downloads the Hugging Face model files automatically on first use into the cache (see below), so your command template can reference the provided paths.
Set one (or both) command templates:

Recommended (no shell): provide a JSON array (command + args):
SUMMARIZE_ONNX_PARAKEET_CMD='["sherpa-onnx", "...", "--tokens", "{vocab}", "--offline-ctc-model", "{model}", "--input-wav", "{input}"]'
SUMMARIZE_ONNX_CANARY_CMD='["my-canary-wrapper", "{model_dir}", "{input}"]'
Shell string (advanced): SUMMARIZE_ONNX_PARAKEET_CMD="sherpa-onnx ... --tokens {vocab} --offline-ctc-model {model} --input-wav {input}"

Notes:

If you use the shell string form, do not quote placeholders (Summarize shell-escapes substituted paths so spaces work and injection risk is reduced).

Placeholders:

{input} — audio path (added to the end if not present)
{model} — downloaded model.onnx path
{vocab} — downloaded vocab.txt path
{model_dir} — parent directory containing the downloaded files

Pick the ONNX model via CLI or env:

Auto (default): leave SUMMARIZE_TRANSCRIBER unset or set SUMMARIZE_TRANSCRIBER=auto
CLI: --transcriber parakeet or --transcriber canary
Env: SUMMARIZE_TRANSCRIBER=parakeet (or canary)

For the Chrome extension, you can pick a permanent default under Settings → Model → Advanced Overrides → Transcriber. The selection is sent with every request. Make sure the daemon environment still has your ONNX CLI commands configured (env vars above) so the override can take effect. Alternatively, export the env vars before running summarize daemon install --token <TOKEN> so the daemon inherits your ONNX command templates and default transcriber.

#Cache + download details

Artifacts are stored under ${SUMMARIZE_ONNX_CACHE_DIR || $XDG_CACHE_HOME || ~/.cache}/summarize/onnx/<model>/.
Set SUMMARIZE_ONNX_MODEL_BASE_URL to point at a mirror (defaults to the Hugging Face repo for the chosen model).
The first run downloads model.onnx and vocab.txt; subsequent runs reuse cached files.

#Behavior

Input audio is transcoded to 16kHz mono WAV via ffmpeg when available; otherwise the original file is passed to the CLI.
Onnx errors (missing command, non-zero exit, empty output) fall back to the existing Whisper flow with a note recorded in the transcript metadata.
Progress UI shows "ONNX (Parakeet/Canary)" while the external transcriber runs.

#Notes

The ONNX inference binary itself is not bundled; users must install or provide it separately.
This flow remains CPU-only and compatible with existing transcript providers.