Automation Agent (Side Panel + Daemon)

Summarize can run as a website automation agent inside the Chrome side panel. This is optional and gated by a checkbox in Options.

Scope:

Off (default): chat is Q&A only (no tools).
On: chat runs a tool-capable agent for web automation.

Explicit exclusions (per product direction): no update checker, no tutorial/welcome flow, no proxy config, no API key dialog. All model calls go through the local daemon.

#Architecture (High Level)

Side panel maintains the agent loop + chat UI.
Background handles tab data, extraction, and tool execution.
Content scripts handle element picking and native-input bridge.
Daemon provides /v1/agent (SSE stream of chunks + final assistant message).

#Data Flow (Agent Loop)

User sends a message in the side panel.
Panel compacts history and sends panel:agent to background with:
messages (pi-ai Message[])
tools (names)
summary (optional current summary markdown)
Background builds pageContent using the latest extract (summary + transcript/text + metadata).
Background calls daemon POST /v1/agent (SSE).
Daemon streams chunk events, then an assistant event (may include tool calls).
Panel executes tool calls locally, appends toolResult messages, and repeats /v1/agent until no tool calls remain.

The daemon never executes tools. It only returns the next assistant message.

#Settings + Permissions

#Settings

automationEnabled (boolean) lives in apps/chrome-extension/src/lib/settings.ts.

Options UI provides the toggle.
When disabled, the tool list is empty and the daemon uses the chat-only prompt.

#Automation Permissions

Defined in apps/chrome-extension/wxt.config.ts:

userScripts – optional; requested via Options for browserjs() main-world script execution.
debugger – required only in build:automation for native input and the debugger tool.

Neither permission is needed for summarization. The standard Chrome build omits debugger because Chrome does not allow it to be optional, and it hides debugger-backed tools. The separate automation build declares it as required.

#Chrome: enable User Scripts (if needed)

chrome://extensions
Open extension details
Enable Allow User Scripts
Reload the tab

#Daemon Endpoint

#`POST /v1/agent` (SSE)

Headers

Authorization: Bearer <token>
Content-Type: application/json

Body

{
  "url": "https://...",
  "title": "Page title",
  "pageContent": "<summary + transcript + metadata>",
  "cacheContent": "<transcript/text used for cache key>",
  "messages": [/* pi-ai Message[] */],
  "model": "auto" | "openai/..." | "anthropic/..." | ...,
  "length": "short" | "xl" | "20k" | ...,
  "language": "auto" | "en" | "de" | ...,
  "tools": ["navigate", "repl", "ask_user_which_element", "skill", "debugger"],
  "automationEnabled": true
}

Response (SSE, default)

event: chunk
data: { "text": "..." }

event: assistant
data: { /* AssistantMessage */ }

event: done
data: {}

event: error
data: { "message": "..." }

Response (JSON) Use Accept: application/json or ?format=json.

{ "ok": true, "assistant": { /* AssistantMessage */ } }

#`POST /v1/agent/history`

Returns cached chat history for the same cache key as /v1/agent.

Body

{
  "url": "https://...",
  "pageContent": "<summary + transcript + metadata>",
  "cacheContent": "<transcript/text used for cache key>",
  "model": "auto" | "openai/..." | "anthropic/..." | ...,
  "length": "short" | "xl" | "20k" | ...,
  "language": "auto" | "en" | "de" | ...,
  "automationEnabled": true
}

Response

{ "ok": true, "messages": [/* Message[] */] }

#Model Resolution (Daemon)

Fixed model (explicit model): parsed as <provider>/<model>. Provider base URL overrides come from config/env (OpenAI, Anthropic, Google, xAI, ZAI). OpenRouter uses OpenAI-compatible completions.
Auto model: uses existing auto-selection logic (buildAutoModelAttempts), preferring API-key transports and then CLI fallback when available.
Synthetic models: created for OpenAI-compatible base URLs (local/openrouter).
maxOutputTokens defaults to 2048 or maxOutputTokens override.
CLI models are supported as auto fallback and via explicit cli/<provider> or cli/<provider>/<model> overrides.
If the daemon still says no model is available after key/install changes, restart or reinstall it so the saved environment snapshot refreshes.

#Page Content Payload

pageContent is built from the latest extract and includes:

Summary (optional): current summary markdown (truncated to settings cap).
Transcript/text: timed transcript when available, otherwise extracted text.
Metadata: URL/title, source (page vs url), extraction strategy, markdown provider, Firecrawl usage, transcript provider + cache status, media duration, word counts, truncation flags.

See buildChatPageContent() usage in apps/chrome-extension/src/entrypoints/background.ts.

#Tools

#1) `navigate`

Navigate the active tab.

Params:

{ "url": "https://...", "newTab": false }

Result:

{ "finalUrl": "https://...", "title": "...", "tabId": 123 }

Notes:

Uses chrome.tabs.update or chrome.tabs.create.
Waits for tab status complete (15s timeout).

#2) `repl`

Execute JavaScript in a sandbox, with browserjs() to run in the page context.

Params:

{ "title": "...", "code": "..." }

Result:

{ "output": "console/return output", "files": [{ "fileName": "...", "mimeType": "...", "contentBase64": "..." }] }

REPL environment:

Runs in a sandboxed iframe (no DOM access to the panel).
browserjs(fn, ...args) runs the function in the page context.
Uses chrome.userScripts.execute in the main world.
navigate({ url }) available inside the REPL (always use for navigation).
sleep(ms) helper.
Console output is captured and returned; return values are appended as => value.
returnFile(name, content, mimeType) or returnFile({ fileName, content, mimeType }) attaches files to the tool result.

Safety:

Navigation inside REPL code (window.location, history, etc.) is rejected. Use navigate() instead.

Page-context helpers (via browserjs()):

Skills libraries are auto-injected when domain patterns match the active URL.
If debugger permission is granted, native helpers are exposed:
nativeClick(selector)
nativeType(selector, text)
nativePress(key)
nativeKeyDown(key) / nativeKeyUp(key)

#3) `ask_user_which_element`

Shows a click-to-select overlay and returns element metadata.

Params:

{ "message": "Optional guidance" }

Result (example):

{
  "selector": "#submit",
  "xpath": "//button[1]",
  "text": "Send",
  "tagName": "button",
  "attributes": { "type": "submit" },
  "html": "<button ...>",
  "boundingBox": { "x": 20, "y": 120, "width": 180, "height": 40 }
}

Overlay UX:

Hover highlights element under cursor.
Click selects.
↑ / ↓ moves up or down the DOM tree.
Esc cancels.

#4) `skill`

CRUD for domain-specific libraries (stored in chrome.storage.local).

Storage keys:

automation.skills (map of name → skill)
automation.skillsSeeded (one-time default seed)

Actions:

list (optionally filtered by URL)
get (optionally includes library code)
create / rewrite
update (string replacement)
delete

Default skills seed from apps/chrome-extension/src/automation/default-skills.json.

Matching:

Glob-like domain patterns (supports * and **).
Match is done against hostname + path (e.g. github.com/*/issues).

Notes:

update is intended for in-place string replacements; use rewrite to rename.
Skills libraries run inside browserjs() and must avoid navigation.

#5) `debugger` (optional)

Runs JavaScript in the main world via the Chrome debugger. Last resort (shows Chrome debug banner).

Params:

{ "action": "eval", "code": "..." }

Result:

{ "text": "...", "details": { ... } }

#6) `summarize`

Run the Summarize pipeline against a URL (summary or extract-only).

Params:

{
  "url": "https://...",
  "extractOnly": false,
  "format": "text",
  "markdownMode": "readability",
  "model": "openai/gpt-5-mini",
  "length": "short",
  "language": "auto",
  "prompt": "Optional override",
  "timeout": "2m",
  "maxOutputTokens": "2k",
  "noCache": false,
  "firecrawl": "auto",
  "preprocess": "auto",
  "youtube": "auto",
  "videoMode": "auto",
  "timestamps": false,
  "maxCharacters": 120000
}

Result (summary):

{ "output": "..." }

Result (extract-only):

{ "output": "...", "details": { "content": "...", "title": "...", "wordCount": 123, ... } }

Notes:

URL-only (uses daemon URL pipeline).
Use extractOnly: true + format: "markdown" to return extracted Markdown.
When format: "markdown", markdownMode defaults to readability.

#Native Input Bridge (Debugger Permission)

Native input events use the Chrome debugger protocol:

browserjs() posts a message to the content script.
Content script forwards to background (automation:native-input).
Background attaches debugger and dispatches:
Input.dispatchMouseEvent (click)
Input.insertText (type)
Input.dispatchKeyEvent (press/keydown/keyup)

If permission is missing, the call fails and the tool reports the error.

#UX Notes

Automation is opt-in via Options checkbox.
Regular summarize flows remain unchanged.
Tool results are treated as data, not instructions. The system prompt still asks the assistant to repeat important tool-derived info in plain text.
Tool results are currently rendered in the chat UI for debugging; they may be hidden in the future.
When the active tab URL changes during a conversation, the panel appends a navigation tool result with the new URL, title, and matching skills. This keeps the agent aware of user-driven navigation.
REPL executions that call browserjs() show a small overlay with an Abort action; aborting stops the current agent loop (best-effort).

#Where Things Live

Daemon: src/daemon/server.ts, src/daemon/agent.ts
Automation tools: apps/chrome-extension/src/automation/
Element picker + native input bridge: apps/chrome-extension/src/entrypoints/automation.content.ts
Background agent proxy: apps/chrome-extension/src/entrypoints/background.ts
Side panel agent loop: apps/chrome-extension/src/entrypoints/sidepanel/
Options toggle: apps/chrome-extension/src/entrypoints/options/

Automation Agent (Side Panel + Daemon)

#Architecture (High Level)

#Data Flow (Agent Loop)

#Settings + Permissions

#Settings

#Automation Permissions

#Chrome: enable User Scripts (if needed)

#Daemon Endpoint

#POST /v1/agent (SSE)

#POST /v1/agent/history

#Model Resolution (Daemon)

#Page Content Payload

#Tools

#1) navigate

#2) repl

#3) ask_user_which_element

#4) skill

#5) debugger (optional)

#6) summarize

#Native Input Bridge (Debugger Permission)

#UX Notes

#Where Things Live

#`POST /v1/agent` (SSE)

#`POST /v1/agent/history`

#1) `navigate`

#2) `repl`

#3) `ask_user_which_element`

#4) `skill`

#5) `debugger` (optional)

#6) `summarize`