NotebookLM Batch

A batch content generation tool that uses NotebookLM to automatically create podcasts, slides, reports, quizzes, and more from YouTube videos and website URLs.

TL;DR

What it does

Batch-generates content from YouTube videos, web pages, and other sources using NotebookLM. Write a YAML instruction file, run it, and walk away.

How it works

Automatically creates a notebook per source → generates content → downloads → deletes the notebook
- NotebookLM is used as a generation engine; no notebooks are left behind on the service
Output files are the source of truth: if a file exists locally, it is skipped (supports rerun and resume)
Auto-resume: re-running the same YAML resumes from where it left off
Idempotent: same source → same output directory and filename (stable hashing)

Let the AI do it all (Claude Code Skill)

The .claude/skills/notebooklm-batch/ directory includes a Claude Code Skill. With Claude Code, you can skip writing YAML entirely — just say what you want:

"Make a podcast from this YouTube video: https://..."

Claude will handle YAML creation, dry-run verification, and background execution automatically.

Recommended use cases

Batch-convert multiple YouTube videos into podcasts, slides, or reports
Bulk-process news articles or tech blogs into reports or flashcards
"Fire and forget" with background execution + GitHub Issue notifications

Supported IN / OUT

IN (source)	OUT (content)
YouTube URL	Podcast (MP3)
Website URL	Infographic (PNG)
Text file	Slide Deck (PDF)
Inline text	Video (MP4)
	Quiz (JSON)
	Flashcards (JSON)
	Report (Markdown)
	Data Table (CSV)

Requirements

Python 3.11+
pipx
A Google account with access to NotebookLM

Installation

See INSTALL.md for full setup instructions.

git clone https://github.com/KunihiroS/notebooklm-batch.git
cd notebooklm-batch
pip install -r requirements.txt
pipx install "notebooklm-py[browser]"
notebooklm login

Quick Start

Prerequisites: Complete installation and authentication first, you need some works at the first time such as NotebookLM login and some more. — check otu the details in INSTALL.md.

Create a YAML instruction file in instructions/:

# instructions/my_task.yaml
settings:
  language: en

tasks:
  - source: "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
    title: "My Notebook"
    contents:
      - type: podcast
        prompt: "Summarize the video in an engaging podcast format."

Run a dry-run to verify:

python3 run_batch.py ./instructions/my_task.yaml --dry-run

Execute in the background:

nohup python3 run_batch.py ./instructions/my_task.yaml > log/nohup_output.log 2>&1 &

Generated files are saved to ./files/<title>__<hash>/.

Supported Sources

Source	Example
YouTube URL	`https://www.youtube.com/watch?v=...`
Website URL	`https://example.com/article`
Text file	`./path/to/document.txt`
Inline text	Plain text string

Supported Output Types

`type`	Output	Format
`podcast`	Audio podcast	MP3
`slide`	Slide deck	PDF
`image`	Infographic	PNG
`video`	Explainer video	MP4
`report`	Written report	Markdown
`quiz`	Quiz questions	JSON
`flashcards`	Flashcards	JSON
`data-table`	Data table	CSV

Contributing

Contributions are welcome. Please open an issue or pull request on GitHub.

Directory Layout

notebooklm-batch/
├── README.md                   # This document
├── AGENTS.md                   # Operational guide for AI agents
├── run_batch.py                # Batch execution script
├── instructions/               # YAML instruction files
│   └── <any-name>.yaml
├── files/                      # Generated outputs
│   └── <safe_title>__<source_hash>/
│       ├── <type>_<hash>.png
│       ├── <type>_<hash>.mp3
│       └── ...
└── log/                        # Run logs and progress files
    └── run_YYYYMMDDHHmmss.json

Directory / File	Description
`instructions/`	Where users place YAML instruction files
`files/`	Where generated outputs (images, audio, video, PDF) are saved
`log/`	Execution logs and progress JSON (used for auto-resume)
`run_batch.py`	Batch processing script
`AGENTS.md`	Operational guide for AI agents running the batch

Execution (Runbook)

Place a YAML instruction file under ./instructions/ and ask an AI agent (or run directly) to execute it with run_batch.py. Initial authentication requires browser-based login by the user.

The AI follows AGENTS.md to execute processing based on the YAML file.
The user can also ask the AI to create the instruction YAML itself.

When asking an AI to create an instruction file

Items the AI will confirm with the user:

Item	Required	Description	Example
source	✅	Target URL(s) (multiple allowed)	`https://www.youtube.com/watch?v=XXXXX`
content type	✅	What to generate	`podcast` / `image` / `slide` / `video`
title	Optional	Notebook name (required for non-YouTube sources)	`AI Weekly Digest`
prompt	Recommended	Generation instructions	`Summarize in a clear, engaging tone`
options	Optional	See "Options" section below	`format: deep-dive`, `length: long`
language	Optional	Default: `ja`	`en`, `ja`

If the user gives a brief request such as "make a podcast from this video", the AI will confirm the desired prompt.

Current behavior of run_batch.py:

The notebook title on NotebookLM uses tasks[].title from the YAML as-is
If tasks[].title is omitted, video_id is used as the title (YouTube only)
No suffix is appended

Instruction File Format

Instruction files are written in YAML.

Location: ./instructions/
Naming: <any-name>.yml or <any-name>.yaml
- e.g. instruction.yaml, ai_digest_20260207.yaml
- Use a name that identifies the purpose or date

Note: Instruction files are YAML (.yml / .yaml), not Markdown (.md).

Basic Structure

# NotebookLM batch instruction
settings:
  language: en                  # Output language
  notify:                       # GitHub Issue notification (omit to disable)
    github_issue: 1             # Issue number to post to

tasks:
  - source: "https://www.youtube.com/watch?v=XXXXX"
    title: "Notebook Title"
    contents:
      - type: podcast
        prompt: "Summarize the video in an engaging way."
        options:
          format: deep-dive
          length: default

      - type: image
        prompt: "Visually summarize the key points."
        options:
          orientation: landscape
          detail: standard

  - source: "https://example.com/article"
    title: "Article Summary"
    contents:
      - type: report
        prompt: "Summarize the key points of the article."

Field Reference

settings (global configuration)

Field	Required	Default	Description
`language`	No	`ja`	Language for generation (passed to `notebooklm generate --language`)
`notify`	No	(disabled)	GitHub Issue notification settings (see below)

notify (GitHub Issue notification)

Posts batch progress, completion, and errors as comments on a GitHub Issue. Omit to disable notification (default behavior).

settings:
  notify:
    github_issue: 1                            # Issue number to post to (required)
    github_repo: "YourName/notebooklm-batch"   # Omit to auto-resolve from git origin

Field	Required	Description
`github_issue`	Yes (when notify is used)	Target Issue number
`github_repo`	No	Target repository (`owner/repo`). Omit to auto-resolve via `gh` CLI from origin remote

Prerequisites:

gh CLI must be installed and authenticated (gh auth status)
The target Issue must exist in the repository

Notification events:

Icon	Timing	Example
🔄	Batch start	`🔄 Batch started: 3 tasks / 7 contents`
⏭️	Skipped (output already exists)	`⏭️ [1/3] image skipped (exists)`
📦	Content completed	`📦 [1/3] image generated`
🚫	RATE_LIMITED / AUTH_REQUIRED	`🚫 Stopped: RATE_LIMITED`
❌	Error	`❌ Error: GENERATE_FAILED`
✅	Batch completed	`✅ Done: NEW:3 SKIP:2 / 05:30 elapsed`
⏹️	Ctrl+C interrupted	`⏹️ Interrupted: 2/5 completed`

Note: Notifications are best-effort. Notification failures do not affect batch processing.

Note on language: corresponds to the --language option of the notebooklm CLI.

If not specified in YAML, run_batch.py defaults to ja

The CLI's own default is en, so specify ja explicitly or omit it (run_batch.py fills in ja)

tasks[] (task definition)

Field	Required	Description
`source`	Yes	Source (YouTube URL / Website URL / text file path / inline text)
`title`	Yes*	Notebook name on NotebookLM & base for output directory name. *YouTube URLs may omit (falls back to `video_id`)
`contents`	Yes (1+)	List of content to generate

Backward compatibility: the url: field still works but emits a warning. Use source: in new files.

tasks[].contents[] (content definition)

Field	Required	Description
`type`	Yes	`podcast` / `image` / `slide` / `video` / `quiz` / `flashcards` / `report` / `data-table`
`prompt`	No	Custom instructions for generation
`options`	No	Per-type options (see below)

Output filename is automatically determined as <type>_<hash>.<ext> (stable hash)

Supported Content Types

Type	CLI command	Output format	Key options
Podcast	`generate audio`	MP3	format, length
Infographic	`generate infographic`	PNG	orientation, detail
Slide Deck	`generate slide-deck`	PDF	format, length
Video	`generate video`	MP4	format, style
Quiz	`generate quiz`	JSON	quantity, difficulty, download_format
Flashcards	`generate flashcards`	JSON	quantity, difficulty, download_format
Report	`generate report`	Markdown	format
Data Table	`generate data-table`	CSV	(none)

Options

When options are omitted, NotebookLM's defaults are used. Specifying options explicitly is recommended for predictable results.

Audio (Podcast)

Option	Values
`--format`	`deep-dive` / `brief` / `critique` / `debate`
`--length`	`short` / `default` / `long`

Infographic

Option	Values
`--orientation`	`landscape` / `portrait` / `square`
`--detail`	`concise` / `standard` / `detailed`

Slide Deck

Option	Values
`--format`	`detailed` / `presenter`
`--length`	`default` / `short`
`download_format`	`pdf` (default) / `pptx` (editable PowerPoint)

Video

Option	Values
`--format`	`explainer` / `brief`
`--style`	`auto-select` / `classic` / `whiteboard` / `kawaii` / `anime` / `watercolor` / `retro-print` / `heritage` / `paper-craft`

Quiz

Option	Values
`--quantity`	`fewer` / `standard` / `more`
`--difficulty`	`easy` / `medium` / `hard`
`download_format`	`json` / `markdown` / `html` (default: `json`)

Note: --quantity more is equivalent to standard internally (known limitation). --language is not supported. Changing download_format changes the output file extension accordingly (markdown → .md, html → .html).

Flashcards

Option	Values
`--quantity`	`fewer` / `standard` / `more`
`--difficulty`	`easy` / `medium` / `hard`
`download_format`	`json` / `markdown` / `html` (default: `json`)

Note: --language is not supported. Changing download_format changes the output file extension accordingly.

Report

Option	Values
`--format`	`briefing-doc` / `study-guide` / `blog-post` / `custom`

Note: Specifying a prompt automatically switches to custom format. Use --append to add custom instructions to an existing format (briefing-doc / study-guide / blog-post).

Data Table

No generate options. Prompt is required (empty prompt causes a CLI error). Output is CSV (UTF-8-BOM).

Example Instruction Files

Minimal (YouTube, single content)

tasks:
  - source: "https://www.youtube.com/watch?v=XXXXX"
    title: "Video Title"
    contents:
      - type: podcast
        prompt: "Summarize the video in an engaging way."

title is optional for YouTube URLs only (falls back to video_id). prompt is also optional but recommended.

Full (multiple sources, multiple content types)

settings:
  language: en
  notify:
    github_issue: 1

tasks:
  - source: "https://www.youtube.com/watch?v=XXXXX"
    title: "AI Trends Digest"
    contents:
      - type: podcast
        prompt: "Cover the overall picture first, then dive into individual topics."
        options:
          format: deep-dive
          length: long
      - type: image
        prompt: "Summarize the 3 key points visually."
        options:
          orientation: portrait
          detail: concise
      - type: slide
        prompt: "Create a presentation suitable for internal sharing."
        options:
          format: presenter
          length: default

  - source: "https://example.com/tech-article"
    title: "Tech Article Summary"
    contents:
      - type: report
        prompt: "Summarize from a technology selection perspective."
        options:
          format: briefing-doc
      - type: quiz
        prompt: "Quiz on the key points of the article."
        options:
          quantity: standard
          difficulty: medium
      - type: data-table
        prompt: "Create a comparison table of the tools mentioned."

Identifiers

run_id
- Identifier assigned to each batch execution
- Format: YYYYMMDDHHmmss (e.g. 20260208200022)
- Used in: progress filename (./log/run_<run_id>.json) and temp file collision avoidance (.__tmp__<run_id>)
source_hash (output directory name)
- Output directory is automatically determined as {safe_title}__{sha256(source)[:8]}
- Same source always maps to the same directory (idempotency)
type_<hash> (filename)
- Output filename is automatically determined as <type>_<hash>.<ext>
- Hash is computed from source/type/prompt/options using a stable hash (same input → same filename)

Design Principles

The execution engine wraps the notebooklm CLI in Python, parsing --json output to update progress
Resume/completion decisions are based on local output files, not NotebookLM server state
- If ./files/<dir>/<type>_<hash>.<ext> exists, that content is always skipped
- Filenames are determined by stable hashing (type_<hash>)
  - e.g. image_7f3a2c1b
  - Hash is computed by JSON-serializing url/type/prompt/options (keys sorted) and taking the first 8 characters of SHA1
  - Purpose: resilient to YAML reordering, ensuring stable skip/resume behavior
On RATE_LIMITED and similar failures, the content is recorded as blocked; re-running with a different account skips already-completed outputs
Output filenames are deterministic and writes go through a temp file (<type>_<hash>.<ext>.__tmp__<run_id>) before a rename commit (resilient to interruption and parallelism)

Operational Requirements (minimum)

The user specifies one YAML file per execution (processing multiple YAMLs simultaneously is not supported)
- A single YAML may contain multiple sources in tasks[]
Invocation:
- python3 ./run_batch.py ./instructions/<INSTRUCTION>.yaml
Dry-run mode (pre-execution verification):
- python3 ./run_batch.py ./instructions/<INSTRUCTION>.yaml --dry-run
- Shows targets, output paths, and skip decisions without generating anything

`run_batch.py` Operational Spec

Base directory is the directory containing this README (and run_batch.py)
- Outputs are saved under ./files/
- Progress is saved to ./log/run_YYYYMMDDHHmmss.json
Re-running the same YAML loads the most recent run_*.json and resumes from where it left off
- Resume condition: instruction_file (canonical path) in run_*.json matches the canonical path of the specified YAML
  - Purpose: prevents resume failures due to path representation differences (e.g. ./instructions/x.yaml vs /abs/path/.../instructions/x.yaml)

Progress is displayed with a spinner and progress bar:

⠹ 🔄 [████████░░░░░░░░░░░░]  40% (2/5) | NEW:1 SKIP:1 | TIME:01:23 | LOG:run_20260211150030.json

Status icons: 🔄 running / ✅ completed / 🚫 blocked / ⏹️ aborted / ❌ error
(2/5) completed / total
NEW:n newly generated / SKIP:n skipped (file already exists)
TIME:MM:SS elapsed (switches to HH:MM:SS after one hour)
LOG:run_*.json progress filename
On block, reason is appended: [AUTH_REQUIRED] / [RATE_LIMITED]

Examples:

# Running
⠹ 🔄 [████████░░░░░░░░░░░░]  40% (2/5) | NEW:1 SKIP:1 | TIME:01:23 | LOG:run_xxx.json

# Completed normally
  ✅ [████████████████████] 100% (5/5) | NEW:3 SKIP:2 | TIME:05:30 | LOG:run_xxx.json

# All skipped (outputs already exist)
  ✅ [████████████████████] 100% (3/3) | NEW:0 SKIP:3 | TIME:00:04 | LOG:run_xxx.json

# Blocked (auth error)
  🚫 [██████████░░░░░░░░░░]  50% (2/4) | NEW:1 SKIP:1 | TIME:02:15 | LOG:run_xxx.json [AUTH_REQUIRED]

# Blocked (rate limit)
  🚫 [██████████████░░░░░░]  66% (2/3) | NEW:2 SKIP:0 | TIME:03:45 | LOG:run_xxx.json [RATE_LIMITED]

Ctrl+C records status=aborted (next run auto-resumes)
RATE_LIMITED / AUTH_REQUIRED records status=blocked and stops
Exit codes:
- 0: completed
- 3: blocked
- 130: aborted (Ctrl+C)
- 1: error (source wait failure / generation failure / download failure)
- 2: usage error (YAML not found / invalid format)

Implementation Status

This README is the authoritative specification. Divergences between the spec and implementation are recorded here.

Target spec (this README):
- instruction_file is stored and compared as a canonical path (realpath equivalent) for representation-agnostic auto-resume
- Filenames are determined by stable hashing (type_<hash>) for resilience to reordering
Current implementation (run_batch.py):
- Both of the above are implemented (as of 2026-02-08)

`run_batch.py` Detailed Spec

This section documents batch execution behavior as steps, states, and decision criteria so that README serves as the authoritative operational reference.

Glossary

Instruction: the YAML file specified by the user (e.g. ./instructions/test_news.yaml)
Task: one element of tasks[] in the YAML (one source)
Content: one element of tasks[].contents[] (podcast / image / slide / etc.)
Output: ./files/<dir>/<type>_<hash>.<ext>
Run state: ./log/run_*.json

Core Invariants (priority order)

Output files are the source of truth (for skip/resume decisions)
- If an output file exists, that Content is always skipped (never regenerated)
Notebooks are disposable (aggressive deletion policy)
- Notebooks are automatically deleted on task completion/failure/interruption
- Re-runs always start fresh notebook creation (no notebook_id/source_id reuse)
Never depend on NotebookLM server state (minimize external dependency)

Filename Determination (Stable Hash)

Filenames are automatically determined as <type>_<hash>.<ext>.

Hash = first 8 characters of SHA1 of url/type/prompt/options JSON-serialized (keys sorted)
Same input always produces the same filename (deterministic)
YAML reordering, addition, or deletion does not change filenames (stable skip/resume)

Auto-Resume (no `--resume` flag needed)

On re-running the same YAML, run_batch.py searches log/run_*.json for the most recent file matching the same instruction_file and resumes under these conditions:

If that run is not completed:
- Load the existing run, set status=running / resumed_at, and continue
Otherwise:
- Create a new run_*.json and start fresh

Note: instruction_file is stored and compared as a canonical path, so representation differences do not break auto-resume.

Canonical path is resolved via Python's Path.resolve() (symlinks are resolved)

Execution Flow (sequential)

Tasks are processed top to bottom
- If all outputs already exist: skip the entire task (no notebook creation)
- notebooklm create --json -- <title> creates the notebook
- notebooklm source add --json <url> → source wait adds the source
Contents are processed top to bottom
- (A) Output file exists: skipped
- (B) Otherwise: generate → identify artifact_id → artifact wait → download → completed
Notebook is deleted at task end (success/failure/interruption)

Error and Stop Handling

AUTH_REQUIRED (session expired) or RATE_LIMITED → blocked
- Records status=blocked and error in progress, stops processing
- After re-authentication (notebooklm login) or account switch, re-running the same YAML auto-resumes
Ctrl+C → aborted
- Records status=aborted and exits
- Re-running the same YAML auto-resumes
Non-Ctrl+C exits:
- SIGTERM (e.g. terminal/app force-close, host shutdown) is also treated as aborted (status=aborted)
- SIGKILL (e.g. kill -9) and power loss cannot be caught — run_*.json may remain with status=running
  - Even then, re-running the same YAML picks up the existing run and continues (output files are the truth; existing files are skipped)
source wait failure (timeout/failure/not ready) or download failure → error
- Records status=error and error.code in progress, exits (exit code 1)
- Re-running restarts from notebook creation; already-completed outputs are skipped

Forcing Regeneration (bypassing skip)

If an output file already exists, re-running the same instruction file skips that content. To regenerate with the same settings:

Method	Operation	Notes
Back up the file	`mv ./files/<dir>/<type>_<hash>.<ext> ./files/<dir>/<type>_<hash>.<ext>.bak`	Keeps the original for comparison
Delete the file	`rm ./files/<dir>/<type>_<hash>.<ext>`	Simple regeneration
Back up the directory	`mv ./files/<dir> ./files/<dir>_old`	Regenerates all content for that source

Note: Changing prompt or options produces a different hash, creating a new file (the existing file remains).

# Example: force regeneration of an image
mv ./files/dQw4w9WgXcQ/image_a1b2c3d4.png ./files/dQw4w9WgXcQ/image_a1b2c3d4.png.bak

# Re-run (the image will be regenerated instead of skipped)
python3 run_batch.py ./instructions/my_instruction.yaml

Mermaid: Execution Flow

sequenceDiagram
  autonumber
  participant U as User
  participant RB as run_batch.py
  participant LOG as ./log/run_*.json
  participant NLM as notebooklm CLI
  participant FS as ./files/

  U->>RB: run_batch.py <instruction.yaml>
  RB->>LOG: Search for existing run (canonical instruction_file match)
  alt Existing run found & status != completed
    RB->>LOG: Load existing run (status=running, resumed_at)
  else Otherwise
    RB->>LOG: Create new run (status=running)
  end

  loop tasks[] (per source)
    RB->>FS: All outputs exist?
    alt yes
      RB->>LOG: task.status=skipped
    else no
      RB->>NLM: create --json -- <title>
      NLM-->>RB: notebook_id
      RB->>LOG: Save notebook_id

      RB->>NLM: source add --json <url>
      NLM-->>RB: source_id
      RB->>NLM: source wait --timeout ... --json
      alt ready
        RB->>LOG: Save source_id
      else auth required
        RB->>NLM: delete -n <notebook_id> -y
        RB->>LOG: status=blocked (AUTH_REQUIRED)
        RB-->>U: exit 3
      else timeout/fail/not_ready
        RB->>NLM: delete -n <notebook_id> -y
        RB->>LOG: status=error (SOURCE_*)
        RB-->>U: exit 1
      end

      loop contents[] (per content)
        RB->>FS: Output exists?
        alt yes
          RB->>LOG: content.status=skipped
        else no
          RB->>NLM: generate <type> --json -- <prompt>
          alt RATE_LIMITED/AUTH_REQUIRED
            RB->>NLM: delete -n <notebook_id> -y
            RB->>LOG: status=blocked
            RB-->>U: exit 3
          else ok
            RB->>NLM: artifact wait --json
            RB->>NLM: download <type>
            RB->>FS: tmp → rename commit
            RB->>LOG: content.status=completed
          end
        end
      end

      RB->>NLM: delete -n <notebook_id> -y
      RB->>LOG: task.status=completed
    end
  end

  RB->>LOG: status=completed, finished_at
  RB-->>U: exit 0

Mermaid: Run State Transitions

stateDiagram-v2
  [*] --> running
  [*] --> skipped: All outputs already exist
  running --> completed
  running --> skipped: All outputs already exist
  running --> blocked
  running --> aborted
  running --> error
  blocked --> running: (re-auth / account switch, then re-run)
  aborted --> running: (re-run)
  error --> running: (re-run)

Status Values

Level	Status	Description
run / task	`running`	In progress
run / task	`completed`	Completed successfully
run / task	`skipped`	Skipped because all outputs already exist
run / task	`blocked`	Stopped due to auth expiry / rate limit
run / task	`aborted`	Interrupted by Ctrl+C / SIGTERM
run / task	`error`	Processing failed (source wait / generate / download failure)
content	`pending`	Not yet processed
content	`running`	Being processed
content	`completed`	Completed successfully
content	`skipped`	Skipped because output file already exists
content	`blocked`	Auth expiry / rate limit
content	`error`	Processing failed

Operational Runbook

Minimum steps for the user to run "specify one YAML and execute."

1. Initial / Re-authentication (only when needed)

If the session has expired or you want to run with a different account:

notebooklm login

Log in to Google in the browser, press ENTER in the terminal once NotebookLM's home is visible

2. Execute

Generation takes minutes to tens of minutes — background execution is recommended. Pair with settings.notify for "fire and forget → GitHub notification" UX.

# Background execution (recommended)
nohup python3 ./run_batch.py ./instructions/<INSTRUCTION>.yaml > log/nohup_output.log 2>&1 &

# Foreground execution (shows progress spinner, Ctrl+C to abort)
python3 ./run_batch.py ./instructions/<INSTRUCTION>.yaml

3. Check results

Progress file (run state): ./log/run_YYYYMMDDHHmmss.json
Outputs: ./files/<dir>/<type>_<hash>.<ext>

4. Interruption (Ctrl+C) and resume

Press Ctrl+C to interrupt
- run_*.json records status=aborted
To resume, re-run the same YAML
- The most recent run_*.json with a matching instruction_file (canonical path) is automatically loaded and processing continues

5. Handling blocked status (AUTH_REQUIRED / RATE_LIMITED)

If run_*.json shows status=blocked, address the cause and re-run:

AUTH_REQUIRED
- notebooklm login to re-authenticate → re-run the same YAML
RATE_LIMITED
- Wait and retry, or switch to a different account
- To switch accounts: back up the existing storage_state.json, then notebooklm login

6. Skip behavior (output already exists)

On re-run, any content whose output file already exists is always skipped.

Filenames are determined as <type>_<hash>.<ext>
Same URL/type/prompt/options → same filename
Changing the prompt produces a different filename (treated as a different output)

7. settings.language

settings.language is a global setting only (per-content language override is not currently supported by run_batch.py).

8. Concurrent execution

Do not run the same YAML with multiple processes simultaneously (no mutual exclusion is implemented).

9. Temp files (during download)

Progress JSON temp file: run_*.json.tmp (for atomic update)
Output temp file: <type>_<hash>.<ext>.__tmp__<run_id> (renamed to final path after successful download)

Temp files may remain after interruption or abnormal exit. They are safe to delete (they are not final output files).

10. Exit codes

0: completed
3: blocked
130: aborted (Ctrl+C)

11. Background execution (nohup)

Use nohup to run batch processing without occupying the terminal. Combine with settings.notify for "fire and forget → GitHub notification" UX.

nohup python3 ./run_batch.py ./instructions/<INSTRUCTION>.yaml > log/nohup_output.log 2>&1 &

> log/nohup_output.log 2>&1 — redirect stdout and stderr to a log file
Trailing & — run in background without blocking the shell
Spinner output is recorded in the log file (use GitHub Issue notifications for progress monitoring)

Check and stop the process:

# Check if running
jobs            # within the same shell session
ps aux | grep run_batch  # from another terminal

# Stop
kill %1         # job number from jobs
kill <PID>      # PID from ps

Note: Closing the terminal does not stop a nohup process. Shutting down the host machine does.

Intended Workflow

User creates an instruction file (YAML)
- List of sources (YouTube URLs, website URLs, etc.)
- Per-source content types, options, and prompts
AI reads the instruction file and runs the batch automatically
- Creates a new notebook on NotebookLM per source
- Adds the source
- Generates and saves the specified content sequentially
AI reports results after processing
- Summary of results per source (success/failure)
- Notebooks are automatically deleted after processing (nothing remains on NotebookLM)

Operational Spec (YAML instruction workflow)

The user only needs to place an instruction YAML and request execution. The AI handles execution, progress management, and recovery.

User responsibilities

Place instruction files (.yml / .yaml) under ./instructions/
Request execution
Complete browser-based notebooklm login authentication on first use

AI responsibilities

Read the instruction file and execute: create notebook → add source → generate → download → save per source
Write execution logs to ./log/
Persist progress for long-running jobs and abnormal exits; on restart, auto-skip completed outputs and regenerate only remaining ones

Execution model (sequential)

Sequential execution is used for stability.

tasks are processed top to bottom
contents within a task are also processed top to bottom

Long-running jobs and recovery

Since notebooks are automatically deleted after processing, recovery after interruption is based on output file existence. On re-run, notebook creation starts fresh, but Contents whose outputs already exist are automatically skipped.

Note: YouTube source ingestion (transcription) is very fast (seconds to tens of seconds), so re-creation overhead is negligible.

Progress persistence (minimum requirements)

Progress is saved under ./log/ per batch (= per instruction file execution).

e.g. ./log/run_YYYYMMDDHHmmss.json
At minimum, the following are recorded per content generation:
- source
- notebook_id
- content type
- output_path
- artifact_id (when available)

Since the same type may be generated multiple times within a notebook, recovery uses artifact_id rather than type alone (regenerate if unavailable).

Progress file example (run_YYYYMMDDHHmmss.json)

{
  "run_id": "20260208200022",
  "instruction_file": "/abs/path/to/.../instructions/example.yaml",
  "started_at": "2026-02-08T20:00:22+09:00",
  "resumed_at": "2026-02-08T21:30:00+09:00",
  "status": "running",
  "tasks": [
    {
      "url": "https://www.youtube.com/watch?v=BDl3X9GqhVc",
      "video_id": "BDl3X9GqhVc",
      "title": "Test News",
      "notebook_id": "2ae0f4ee-0164-4ad4-b2aa-48ac7986a72b",
      "source_id": "ac317bde-4746-4e70-998f-e05a924a8163",
      "status": "running",
      "contents": [
        {
          "content_id": "image_7f3a2c1b",
          "type": "image",
          "prompt": "Visually summarize the key points.",
          "options": {
            "orientation": "landscape",
            "detail": "standard"
          },
          "task_id": "<generation job ID (obtained at generate time)>",
          "artifact_id": "<artifact ID (obtained after generation completes)>",
          "status": "running",
          "output_path": "files/BDl3X9GqhVc/image_7f3a2c1b.png",
          "error": null
        }
      ]
    }
  ],
  "finished_at": "<set when completed/blocked/aborted/error>"
}

Error object structure (on error)

{
  "error": {
    "code": "GENERATE_FAILED",
    "at": "2026-02-08T20:15:30+09:00",
    "stdout": "<CLI stdout>",
    "stderr": "<CLI stderr>",
    "detail": { "<CLI JSON output if available>" }
  }
}

Common error codes: CREATE_FAILED, SOURCE_ADD_FAILED, SOURCE_WAIT_FAILED, SOURCE_NOT_READY, GENERATE_FAILED, ARTIFACT_WAIT_FAILED, ARTIFACT_FAILED, ARTIFACT_NOT_FOUND, DOWNLOAD_FAILED, AUTH_REQUIRED, RATE_LIMITED

Notes:

instruction_file is stored and compared as a canonical path (equivalent to realpath)
resumed_at is set only on resume runs
task_id is stored only when available from generate
artifact_id is stored only when available after generation (regenerate if unavailable)
content_id is an internal identifier (auto-determined as type_<hash>)

Status polling / waiting

Wait for source ingestion to complete:
- notebooklm source wait <SOURCE_ID> --timeout <sec> [--json]
Check generation job status:
- notebooklm artifact poll <TASK_ID> -n <NOTEBOOK_ID>
- Note: artifact poll does not support --json
- If TASK_ID is unavailable, use artifact wait --json after artifact_id is confirmed (run_batch.py saves artifact_id to progress)
Wait for generation to complete (blocking):
- notebooklm artifact wait <ARTIFACT_ID> -n <NOTEBOOK_ID> --timeout <sec> --interval <sec> [--json]
- run_batch.py defaults to --timeout 86400 --interval 5 (24-hour timeout, effectively unlimited)

Use --json with generate to get machine-readable output and reliably capture identifiers such as artifact_id.

Cleanup (notebook deletion)

Aggressive deletion policy: notebooks are automatically deleted after processing (success/failure/interruption). Manual notebook deletion is not normally required.

If a notebook remains for any reason, delete it from the Web UI or via CLI:

e.g. notebooklm delete -n <NOTEBOOK_ID> -y (partial ID is accepted)

notebooklm-py Usage Guide

Guide for automatically generating podcasts, images, slides, and videos from YouTube videos.

Environment Setup (using pipx)

Initial setup (once only)

# Install via pipx
pipx install "notebooklm-py[browser]"

# Install Playwright's Chromium
pipx run --spec "notebooklm-py[browser]" python -m playwright install chromium

# Initial authentication
notebooklm login
# → Browser opens for Google login → press ENTER to save

# Verify
notebooklm --version

Credentials are saved to ~/.notebooklm/storage_state.json.

Tips

To try a different account, back up ~/.notebooklm/storage_state.json and re-run notebooklm login
If a notebook title starts with -, the CLI interprets it as an option — use -- to separate it (e.g. notebooklm create --json -- "-LpMZyDZI8k")
Infographic (image) generation can take several minutes (source ingestion wait + generation and download)
Generation may hit RATE_LIMITED (Google-side rate limiting). If recovery is difficult, consider re-logging in with a different account
Running multiple test cycles can accumulate notebooks on NotebookLM; delete unneeded ones with notebooklm delete -n <NOTEBOOK_ID> -y

Development History

2026-02-12

GitHub Issue Notification Feature

Background: Batch processing takes a long time (image ~3 min, slide ~6 min, podcast/video 10+ min), blocking the terminal UX.

Implementation:

Added settings.notify option (github_issue, github_repo)
Posts GitHub Issue comments at each batch step to notify progress
Notifications are best-effort (failures do not affect batch processing)
Combines with nohup for "fire and forget → notification arrives" UX

Notification events: start, skip, content complete, RATE_LIMITED, AUTH_REQUIRED, error, batch complete, interrupted

Issue #2: Output directory name customization

Background: Output directories were fixed to video_id, which was hard to read.

Implementation:

Added settings.output_dir_mode option
Three modes:
- video_id: video_id only (default, backward compatible)
- title: title only (readability focused)
- title_with_video_id: title + video_id (recommended)
slugify() function converts to filesystem-safe strings
Handles Windows reserved names, unsafe characters, and length limits

Output examples:

video_id:              files/dQw4w9WgXcQ/
title:                 files/Rick_Astley_-_Never_Gonna_Give_You_Up/
title_with_video_id:   files/Rick_Astley_-_Never_Gonna_Give_You_Up__dQw4w9WgXcQ/

Issue #1: Aggressive deletion policy

Background: Even when output files already existed, notebooks and sources were created on NotebookLM, causing large numbers of duplicate notebooks to accumulate during test runs.

Decision:

Option A (keep notebooks and reuse): complex state management
Option B (generate → download → delete): simple implementation, eliminates duplication problem → adopted

Implementation:

Notebooks are automatically deleted on task completion/failure/interruption
If all outputs already exist, skip notebook creation entirely
Enforce "output files are the source of truth" principle (no dependency on NotebookLM server state)
try/finally pattern prevents deletion leaks

Accepted trade-offs:

Cannot inspect results in NotebookLM Web UI → deemed unnecessary
Notebook creation on every regeneration → source addition is fast (seconds to tens of seconds), so acceptable

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude/skills/notebooklm-batch		.claude/skills/notebooklm-batch
assets		assets
files		files
instructions		instructions
log		log
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_batch.py		run_batch.py

Folders and files

Latest commit

History

Repository files navigation

NotebookLM Batch

TL;DR

What it does

How it works

Let the AI do it all (Claude Code Skill)

Recommended use cases

Supported IN / OUT

Requirements

Installation

Quick Start

Supported Sources

Supported Output Types

Contributing

Directory Layout

Execution (Runbook)

When asking an AI to create an instruction file

Instruction File Format

Basic Structure

Field Reference

settings (global configuration)

notify (GitHub Issue notification)

tasks[] (task definition)

tasks[].contents[] (content definition)

Supported Content Types

Options

Audio (Podcast)

Infographic

Slide Deck

Video

Quiz

Flashcards

Report

Data Table

Example Instruction Files

Minimal (YouTube, single content)

Full (multiple sources, multiple content types)

Identifiers

Design Principles

Operational Requirements (minimum)

run_batch.py Operational Spec

Implementation Status

run_batch.py Detailed Spec

Glossary

Core Invariants (priority order)

Filename Determination (Stable Hash)

Auto-Resume (no --resume flag needed)

Execution Flow (sequential)

Error and Stop Handling

Forcing Regeneration (bypassing skip)

Mermaid: Execution Flow

Mermaid: Run State Transitions

Status Values

Operational Runbook

1. Initial / Re-authentication (only when needed)

2. Execute

3. Check results

4. Interruption (Ctrl+C) and resume

5. Handling blocked status (AUTH_REQUIRED / RATE_LIMITED)

6. Skip behavior (output already exists)

7. settings.language

8. Concurrent execution

9. Temp files (during download)

10. Exit codes

11. Background execution (nohup)

Intended Workflow

Operational Spec (YAML instruction workflow)

User responsibilities

AI responsibilities

Execution model (sequential)

Long-running jobs and recovery

Progress persistence (minimum requirements)

Progress file example (run_YYYYMMDDHHmmss.json)

Error object structure (on error)

Status polling / waiting

Cleanup (notebook deletion)

notebooklm-py Usage Guide

`run_batch.py` Operational Spec

`run_batch.py` Detailed Spec

Auto-Resume (no `--resume` flag needed)

Packages