NVIDIA Parakeet/Canary ONNX transcription
Summarize can now run local transcription through NVIDIA's Parakeet-TDT 0.6B-v3 or Canary 1B-v2 ONNX exports by shelling out to a user-provided CLI. Auto selection prefers ONNX when configured; you can still force Whisper or a specific ONNX model.
#How to enable
- Install a CLI capable of running the ONNX models (e.g.
sherpa-onnxor a custom wrapper). Homebrew may not have a formula; use upstream binaries or build from source if needed. The CLI must emit the transcribed text on stdout and accept a single WAV input path. Summarize now downloads the Hugging Face model files automatically on first use into the cache (see below), so your command template can reference the provided paths. - Set one (or both) command templates:
- Recommended (no shell): provide a JSON array (command + args):
SUMMARIZE_ONNX_PARAKEET_CMD='["sherpa-onnx", "...", "--tokens", "{vocab}", "--offline-ctc-model", "{model}", "--input-wav", "{input}"]'SUMMARIZE_ONNX_CANARY_CMD='["my-canary-wrapper", "{model_dir}", "{input}"]'- Shell string (advanced):
SUMMARIZE_ONNX_PARAKEET_CMD="sherpa-onnx ... --tokens {vocab} --offline-ctc-model {model} --input-wav {input}"
Notes:
- If you use the shell string form, do not quote placeholders (Summarize shell-escapes substituted paths so spaces work and injection risk is reduced).
Placeholders:
{input}— audio path (added to the end if not present){model}— downloadedmodel.onnxpath{vocab}— downloadedvocab.txtpath{model_dir}— parent directory containing the downloaded files
- Pick the ONNX model via CLI or env:
- Auto (default): leave
SUMMARIZE_TRANSCRIBERunset or setSUMMARIZE_TRANSCRIBER=auto - CLI:
--transcriber parakeetor--transcriber canary - Env:
SUMMARIZE_TRANSCRIBER=parakeet(orcanary)
For the Chrome extension, you can pick a permanent default under Settings → Model → Advanced Overrides → Transcriber. The selection is sent with every request. Make sure the daemon environment still has your ONNX CLI commands configured (env vars above) so the override can take effect. Alternatively, export the env vars before running summarize daemon install --token <TOKEN> so the daemon inherits your ONNX command templates and default transcriber.
#Cache + download details
- Artifacts are stored under
${SUMMARIZE_ONNX_CACHE_DIR || $XDG_CACHE_HOME || ~/.cache}/summarize/onnx/<model>/. - Set
SUMMARIZE_ONNX_MODEL_BASE_URLto point at a mirror (defaults to the Hugging Face repo for the chosen model). - The first run downloads
model.onnxandvocab.txt; subsequent runs reuse cached files.
#Behavior
- Input audio is transcoded to 16kHz mono WAV via
ffmpegwhen available; otherwise the original file is passed to the CLI. - Onnx errors (missing command, non-zero exit, empty output) fall back to the existing Whisper flow with a note recorded in the transcript metadata.
- Progress UI shows "ONNX (Parakeet/Canary)" while the external transcriber runs.
#Notes
- The ONNX inference binary itself is not bundled; users must install or provide it separately.
- This flow remains CPU-only and compatible with existing transcript providers.