LLM / summarization mode
By default summarize will call an LLM using direct provider API keys. When CLI tools are installed, auto mode can use local CLI models via cli.enabled or implicit auto CLI fallback (cli.autoFallback; see docs/cli.md).
#Defaults
- Default model:
auto - Override with
SUMMARIZE_MODEL, config file (model), or--model.
#Env
.env(optional): when running the CLI,summarizealso reads.envin the current working directory and merges it into the environment (real env vars win).~/.summarize/config.jsonenv(optional): fallback env defaults when process env is missing/blank.XAI_API_KEY(required forxai/...models)XAI_BASE_URL(optional; override xAI API endpoint)OPENAI_API_KEY(required foropenai/...models)OPENAI_BASE_URL(optional; OpenAI-compatible API endpoint, e.g. OpenRouter)OPENAI_USE_CHAT_COMPLETIONS(optional; force OpenAI chat completions)NVIDIA_API_KEY(required fornvidia/...models; alias:NGC_API_KEY)NVIDIA_BASE_URL(optional; override NVIDIA OpenAI-compatible API endpoint; default:https://integrate.api.nvidia.com/v1)OPENROUTER_API_KEY(optional; required foropenrouter/...models; also used whenOPENAI_BASE_URLpoints to OpenRouter)GITHUB_TOKEN/GH_TOKEN(required forgithub-copilot/...models via GitHub Models)Z_AI_API_KEY(required forzai/...models; supportsZAI_API_KEYalias)Z_AI_BASE_URL(optional; override default Z.AI base URL)OLLAMA_BASE_URL(optional; override default Ollama OpenAI-compatible base URL — defaulthttp://localhost:11434/v1. Set this to enable Ollama auto-discovery in the extension model picker.)GEMINI_API_KEY(required forgoogle/...models; also acceptsGOOGLE_GENERATIVE_AI_API_KEY/GOOGLE_API_KEY)GOOGLE_BASE_URL/GEMINI_BASE_URL(optional; override Google API endpoint)ANTHROPIC_API_KEY(required foranthropic/...models)ANTHROPIC_BASE_URL(optional; override Anthropic API endpoint)SUMMARIZE_MODEL(optional; overrides default model selection)CLAUDE_PATH/CODEX_PATH/GEMINI_PATH/AGENT_PATH/OPENCLAW_PATH/OPENCODE_PATH/COPILOT_PATH(optional; override CLI binary paths)
#Flags
--model <model>- Examples:
cli/codex/gpt-5.2openai/gpt-5.5codex-fast(explicit Codex CLI GPT Fast preset)cli/claude/sonnetcli/gemini/flashcli/agent/autocli/openclaw/maincli/opencode/openai/gpt-5.4cli/copilot/gpt-5.2openai/gpt-5.4openai/gpt-5.4-miniopenai/gpt-5.4-nanogoogle/gemini-3-flashopenai/gpt-5-miniopenai/gpt-5-nanogithub-copilot/gpt-5.4nvidia/z-ai/glm5zai/glm-4.7ollama/qwen3:14bollama/llama3.1:8bxai/grok-4-fast-non-reasoninggoogle/gemini-2.0-flashanthropic/claude-sonnet-4-5openrouter/meta-llama/llama-3.3-70b-instruct:free(force OpenRouter)--cli [provider]- Examples:
--cli claude,--cli Gemini,--cli codex,--cli agent,--cli openclaw,--cli opencode,--cli copilot(equivalent to--model cli/<provider>);--clialone uses auto selection with CLI enabled. --model auto- See
docs/model-auto.md --model <preset>- Uses a built-in or config-defined preset (see
docs/config.md→ “Presets”). --prompt <text>/--prompt-file <path>- Overrides the built-in summary instructions (prompt becomes the instruction prefix).
- Prompts are wrapped in
<instructions>,<context>,<content>tags. - When
--lengthis numeric, we addOutput is X characters.When--languageis explicitly set, we addOutput should be <language>. --no-cache- Bypass summary cache reads and writes only (LLM output). Extract/transcript caches still apply.
--cache-stats- Print cache stats and exit.
--clear-cache- Delete the cache database and exit. Must be used alone.
--video-mode auto|transcript|understand- Only relevant for video inputs / video-only pages.
--length short|medium|long|xl|xxl|<chars>- This is soft guidance to the model (no hard truncation).
- Minimum numeric value: 50 chars.
- Built-in default:
xl. - Config default:
output.lengthin~/.summarize/config.json. - Output format is Markdown; use short paragraphs and only add bullets when they improve scanability.
--force-summary- Always run the LLM even when extracted content is shorter than the requested length.
--max-output-tokens <count>- Hard cap for output tokens (optional).
- If omitted, no max token parameter is sent (provider default).
- Minimum numeric value: 16.
- Recommendation: prefer
--lengthunless you need a hard cap (some providers count “reasoning” into the cap). --thinking none|low|medium|high|xhigh- Sets OpenAI reasoning effort for
openai/...GPT-5-family models. - Short aliases:
off,min(low),mid/med,x-high,extra-high. --fast- Shorthand for
--service-tier faston OpenAI models. --service-tier default|fast|priority|flex- OpenAI service tier override.
fastis the summarize-facing alias for OpenAIpriority;defaultsends no service tier. --retries <count>- LLM retry attempts on timeout (default: 1).
--json(includes prompt + summary in one JSON object)
#Prompt rules
- Video and podcast summaries omit sponsor/ads/promotional segments; do not include them in the summary.
- Do not mention or acknowledge sponsors/ads, and do not say you skipped or ignored anything.
- If a standout line is present, include 1-2 short exact excerpts formatted as Markdown italics with single asterisks. Do not use quotation marks of any kind (straight or curly). If a title or excerpt would normally use quotes, remove them and optionally italicize the text instead. Apostrophes in contractions are OK. Never include ad/sponsor/boilerplate excerpts and do not mention them. Avoid sponsor/ad/promo language, brand names like Squarespace, or CTA phrases like discount code.
- Final check: remove sponsor/ad references or mentions of skipping/ignoring content. Remove any quotation marks. Ensure standout excerpts are italicized; otherwise omit them.
- Hard rules: never mention sponsor/ads; never output quotation marks of any kind (straight or curly), even for titles.
#Z.AI
Use --model zai/<model> (e.g. zai/glm-4.7). Defaults to Z.AI’s base URL and uses chat completions.
#Ollama
Use --model ollama/<model> (e.g. ollama/qwen3:14b, ollama/gemma3:12b) to talk to a local Ollama instance over its OpenAI-compatible endpoint. See docs/ollama.md for full setup, model recommendations, and limitations.
#GitHub Copilot / GitHub Models
Use --model github-copilot/<model> for explicit GitHub-hosted model calls.
- Examples:
github-copilot/gpt-5.4github-copilot/gpt-5.4-minigithub-copilot/gpt-5.4-nanogithub-copilot/gpt-5-minigithub-copilot/gpt-5-nanogithub-copilot/anthropic/claude-haiku-4.5- Auth:
GITHUB_TOKENorGH_TOKEN - Transport: GitHub Models chat completions (
https://models.github.ai/inference) - Notes:
- bare shorthand like
github-copilot/gpt-5.4orgithub-copilot/claude-opus-4.6auto-expands to the provider-qualified backend id - document attachments stay unsupported in this mode
#Input limits
- Text prompts are checked against the model’s max input tokens (LiteLLM catalog) using a GPT tokenizer.
- Text files over 10 MB are rejected before tokenization.
#PDF attachments
- For PDF inputs,
--preprocess autowill send the PDF directly to Anthropic/OpenAI/Gemini when a fixed model supports documents; otherwise we fall back to markitdown. --preprocess alwaysforces markitdown (no direct attachments).- When markitdown returns only page headers for an image-only PDF, summarize can retry with OpenAI vision OCR if
OPENAI_API_KEYis available. The OCR model defaults togpt-4o-mini; setMARKITDOWN_OCR_MODELto override it, orMARKITDOWN_OCR_DPIto tune rendered page size. This sends rendered PDF pages to the OpenAI API and may incur per-page vision costs. - Streaming is disabled for document attachments.