YouTube mode
YouTube URLs use transcript-first extraction.
--youtube auto|web|no-auto|apify|yt-dlp
auto(default): tryyoutubei→captionTracks→yt-dlp(if configured) → Apify (if token exists)web: tryyoutubei→captionTracksonlyno-auto: try creator captions only (skip auto-generated/ASR) →yt-dlp(if configured)apify: Apify onlyyt-dlp: download audio + transcribe (localwhisper.cpppreferred; OpenAI/FAL fallback)
youtubei vs captionTracks
youtubei:- Calls YouTube’s internal transcript endpoint (
/youtubei/v1/get_transcript). - Needs a bootstrapped
INNERTUBE_API_KEY, context, andgetTranscriptEndpoint.paramsfrom the watch page HTML. - When it works, you get a nice list of transcript segments.
- Calls YouTube’s internal transcript endpoint (
captionTracks:- Downloads caption tracks listed in
ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks. - Fetches
fmt=json3first and falls back to XML-like caption payloads if needed. - Often works even when the transcript endpoint doesn’t.
- Downloads caption tracks listed in
Fallbacks
- If no transcript is available, we still extract
ytInitialPlayerResponse.videoDetails.shortDescriptionso YouTube links can still summarize meaningfully. - Apify is an optional fallback (needs
APIFY_API_TOKEN).- By default, we use the actor id
faVsWy9VTSNVIhWpR(Pinto Studio’s “Youtube Transcript Scraper”).
- By default, we use the actor id
yt-dlprequires theyt-dlpbinary (either setYT_DLP_PATHor have it onPATH) and either localwhisper.cpp(preferred) orOPENAI_API_KEY/FAL_KEY.- If OpenAI transcription fails and
FAL_KEYis set, we fall back to FAL automatically.
- If OpenAI transcription fails and
Example
pnpm summarize -- --extract "https://www.youtube.com/watch?v=I845O57ZSy4&t=11s"
Slides
Use --slides to extract slide screenshots for YouTube videos (requires ffmpeg and yt-dlp).
Scene detection auto-tunes the threshold using sampled frame hashes:
summarize "https://www.youtube.com/watch?v=..." --slides
summarize "https://www.youtube.com/watch?v=..." --slides --slides-ocr
Slides are written to ./slides/<videoId>/ by default (override with --slides-dir). OCR results
are stored in slides.json and included in JSON output (--json).
If yt-dlp gets a 403 from YouTube, set SUMMARIZE_YT_DLP_COOKIES_FROM_BROWSER=chrome (or
chrome:Profile 1) to pass cookies through to yt-dlp.
Relevant flags:
--slides-scene-threshold <value>: starting threshold for scene detection (auto-tuned as needed)