跳转至

Scientific Book Reading — Standard Operating Procedure

Purpose: General guide for reading scientific/technical books and producing structured notes.
Scope: Applicable to any technical monograph, textbook, or edited volume in computational mechanics, applied mathematics, biology, physics, or engineering.
Output: Per-chapter notes + book summary + this reusable instruction.

Revision history: - 2026-06-02 — Initial Phase 8 sync workflow + progress.md lifecycle. - 2026-06-06 — §8.2/§8.3/§8.9 added: docs/books/ and site/ in the publish repo are 100% Cloudflare-generated. The main agent must never commit them. See §8.9 for the full rule and recovery procedure. - 2026-06-06 — §2.1.2 added: chapter note title format 第XX章:中文标题(英文标题) with book-language rule (Chinese books → no English title; English books → use the original English title verbatim).


Phase 0 — Before You Start

0.1 Acquire the Book

  • Confirm the exact title, author(s), edition, ISBN, and publication year.
  • Check whether a DOI or open-access PDF is available — prefer the publisher's PDF for layout integrity.
  • Note any companion resources (solutions manual, code repository, dataset).

0.2 Assess the Book

Answer these questions before reading: - Who is the intended audience? (grad students / researchers / practitioners) - What prerequisites does it assume? - Is it a monograph (single author's research perspective) or a multi-author edited volume? - What is the book's main thesis or unifying theme? - How is it structured? (by topic, by dimensionality, by method, by application?) - What is the expected outcome of your reading? (survey for lit review, deep-dive for implementation, teaching preparation, etc.)

0.3 Set Up the Workspace

Folder naming convention (strict):

[Year]-[Book-Title]-[Author]
  • Year: 4-digit publication year
  • Book-Title: title with spaces replaced by - (hyphens)
  • Author: surname of the first author only (no co-authors)
  • Example: 2017-The-Mathematics-and-Mechanics-of-Biological-Growth-Goriely

Full-title rule (revised 2026-06-02): use the complete book title in the folder name. Do not abbreviate to 3-word or any other short form. The example above (...Biological-Growth-Goriely, 9 words) is the correct form — do not shorten it. This rule applies to all new folders; do not retroactively rename existing folders (preserves site URLs).

~/Documents/Reading/
└── [Year]-[Book-Title]-[Author]/
    ├── book_summary.md          # 全书概括(必做,详见 Phase 3)
    ├── instruction.md           # copy of this SOP
    ├── Chapter-01.md            # per-chapter notes
    ├── Chapter-02.md
    └── ...

0.4 Mandatory vs Optional Outputs

Output Mandatory? When to skip
Chapter-NN.md(每章笔记) 必做 无——所有章都要写
book_summary.md(全书概括) 必做 仅有极个别例外:用户明确说"不需要 summary"
reference_implementations/(代码复现) 不做 除非用户在 Phase 0 明确说"要复现某算法"——本工作流默认不建
公式汇总表 可选,按需 若章内公式 ≤ 3 个或全是 inline 简短公式,强求汇总表;公式多且相互引用频繁时才用
ch_src/ch_NN.txt(章节文本切分) 临时必做 用于复现性验证 + 写笔记时引用——但同步完成后会被 Phase 8 清理,不需要长期保留

0.5/0.6 — Check Existing Work & Agent Strategy (mandatory before Phase 1)

Two decisions must be made explicitly before any reading begins. Both ask the user once, then move on.

Step 1 — Existing work check. Check git history and detect name collisions:

# In the source repo
git log --oneline --all | grep -i "<book-slug>"
# In the site repo
cd ~/Documents/reading-notes-site && git log --oneline | grep -i "<book-slug>"
git ls-files | grep -i "<book-slug>"

If a prior version exists, decide with the user (A: discard + re-read, recommended; B: revise the old; C: keep old, write new folder for side-by-side). If multiple folders match (e.g., a -bak and a current), clarify before writing anything.

Step 2 — Agent strategy gate. Before any main-agent-vs-subagent decision, explicitly confirm with the user (CLI → use check_list.md checkbox, NOT clarify buttons):

  (a) 全部主 agent 亲自做,不触发子任务  ← default for this user
  (b) 哪些章节可以委托 subagent?       ← e.g., "Ch 5-10 can be delegated"
  (c) 全部 subagent 并行做

Default (in the absence of a reply): (a) main agent does everything. Record the decision in progress.md at the book root:

## Execution Mode
- Mode: (a) main agent serial | (b) mixed | (c) full subagent
- Decided by: [user message or default]
- Notes: [...]

These are the only two "ask the user first" decisions in the whole SOP. Everything else (chapter structure, format, formulas) is decided by §2.2 / §3.0 / §5.

0.7 Auto-detect PDF Metadata and Create Folder

When the user gives a PDF path, the main agent automatically extracts title, author(s), and year, then proposes a folder name per §0.3. Do not require the user to type the metadata manually.

Triggers: the user says "read this PDF" / "read " with a path; the user says "I have a new book at "; or any PDF input with no existing folder (§0.5).

Extraction (3-step):

# Step 1: pdfinfo for metadata fallback
pdfinfo "$pdf" | grep -E "^(Title|Author|Subject|Keywords|CreationDate)"

# Step 2: pymupdf for first 2 pages of text (most reliable for academic PDFs)
python3 << 'EOF'
import pymupdf
doc = pymupdf.open("$pdf")
n = min(2, len(doc))
for i in range(n):
    print(f"--- page {i+1} ---")
    print(doc[i].get_text()[:2000])
EOF

Parsing rules (applied by the main agent to the page-1/2 text):

  • Title: the line(s) with the largest font size in the first 1-3 lines, excluding ISBNs, journal names, conference names, page numbers. Multi-line title → concatenate with single space.
  • First author: the first name in the author list (before any "and" / "," / "&"). Editor-only book → use the editor's surname.
  • Year: publication year from the title or copyright page. Multiple years (e.g., "© 2024 Springer") → copyright year.
  • Publisher / DOI: optional, skip.

Confirm with user (one round) before creating the folder:

Detected:
  Title: Fractional Dispersive Models and Applications
  Author: Kevrekidis, Cuevas-Maraver (eds.)
  Year: 2024
Proposed folder: 2024-Fractional-Dispersive-Models-and-Applications-Kevrekidis
Confirm? (y / edit title / edit author / cancel)

Then mkdir -p ~/Documents/Reading/<slug>. PDF location policy: the PDF is not moved into the new folder — keep it at the original location (e.g., ~/Documents/Books/) so the user has a single canonical source. The folder under ~/Documents/Reading/ contains only notes + intermediates (which Phase 8 cleans up).

Fallback when title detection fails (e.g., scanned book, no text layer, or cover-only title page):

  1. Try vision_analyze on a rendered cover (pymupdf → render page 0 at 150 dpi → save to /tmp/cover.png).
  2. Still unclear → ask the user directly (one short prompt, not check_list): Title: ... / Author: ... / Year: .... This is the only case where the main agent asks for metadata.

Edge cases for folder naming:

  • Multi-author (editors) → use the first editor's surname.
  • Single-word title (e.g., Causality) → use the full single word, no padding.
  • Subtitle with colon (e.g., Deep Learning: A Practitioner's Approach) → drop the colon, concatenate.
  • Edition number (e.g., Statistics, 4th ed.) → drop the edition tag.

---

Phase 1 — Structural Survey

1.1 Extract Text from PDF

Choose extraction method by priority:

Method Command When to Use
pdftotext (layout) pdftotext -layout book.pdf - Most PDFs; preserves columns
PyMuPDF import pymupdf; doc.open() Encrypted or non-standard PDF
marker-pdf marker_single book.pdf --output_dir out/ Scanned documents; OCR needed

Verify extraction quality:

pdftotext -layout book.pdf - | head -50   # check page 1
pdftotext -layout book.pdf - | wc -l       # line count
pdftotext -layout book.pdf - | wc -w       # word count

1.2 Identify Chapter Boundaries

pdftotext book.pdf - | grep -n "^Chapter \|^[0-9]\+\.[0-9] "

For multi-author edited volumes, look for section markers instead of chapter markers.
Record the line numbers of each chapter/section heading.

1.3 Extract Each Chapter

Use the book-chapter-splitter skill (load with skill_view(name='book-chapter-splitter')). It bundles two scripts: split_chapters.py (heuristic, try first) and split_chapters_byteoffset.py (bulletproof fallback). Both write to <book-folder>/ch_src/ch_NN.txt.

Default flow:

python3 ~/.hermes/skills/productivity/book-chapter-splitter/scripts/split_chapters.py <pdf> <book-folder>/
# verify (see §1.5)
# if mis-located, switch to byteoffset:
python3 ~/.hermes/skills/productivity/book-chapter-splitter/scripts/split_chapters_byteoffset.py <pdf> <book-folder>/

If neither works, the book is non-standard — fall back to manual byte-offset repair (see skill's Common Pitfalls §3-4) or OCR the book first via the ocr-and-documents skill.

1.4 Fallback: When Auto-Split Fails

The heuristic splitter mis-locates chapters when TOC entries match against running headers on every page, or chapter keywords like "Introduction" / "Conclusion" appear in multiple chapters. When ≥ 30% of chapters are mis-located, switch to the byteoffset splitter (same skill) — it uses pymupdf.get_toc() page numbers and does not regex-match chapter headings, so it cannot mis-fire on running headers. Do not keep tuning the heuristic regex past 2 attempts; switch is faster.

1.5 Verify Each Split File (MANDATORY)

After splitting (auto or manual), always check head + tail of every chapter file:

for f in ch_src/ch_*.txt; do
  echo "=== $f ==="
  head -3 "$f"        # First 3 lines: should be the chapter title + opening prose
  echo "---"
  tail -3 "$f"        # Last 3 lines: should be the chapter's last prose, then references
done

Why head -3 / tail -3 (not -1): a single line can be a header artifact (page number, running header) and look "correct" by coincidence. Three lines catch:

  • the first line is a TOC entry like 5. Author Name instead of the actual chapter title (running-header pollution)
  • the chapter opens with a leftover page number before the real title
  • the last line is Index or References from a different chapter (off-by-one split)
  • the chapter ends mid-sentence (truncation)
  • the chapter is suspiciously short (< 5 KB for a normal book → likely truncated)

If any check fails, do not proceed to Phase 2. Go back to §1.4 (auto-split fallback) or manual byte-offset repair.

1.6 Skim Before Reading

Read for each chapter in order: - Abstract / introduction - Section headings and subheadings - First and last paragraph of each major section - Figures, tables, and their captions - Summary / conclusions at end of chapter

This gives you the book's skeleton before committing to deep reading.


Phase 2 — Per-Chapter Deep Reading

2.1 Reading Strategy by Chapter Type

Type Strategy
Theory/methods chapters Read with pencil: reproduce key derivations, mark assumptions
Application/chapter chapters Read for physical insight: what phenomenon, what model, what prediction
Review/survey chapters Read critically: compare cited works, note consensus vs. debate
Computational chapters Read with code: pseudo-code → actual implementation

2.1.1 Large Chapter Splitting (>100 KB)

If a single chapter's text file (in ch_src/ch_NN.txt) exceeds 100 KB, do NOT try to read it all at once. Split it into 2-3 sub-reads at section boundaries (look for ^### or ^## heading patterns, or ^Section N style markers).

Two practical approaches:

Approach A — read at offset/limit (the main agent way):

# Read in two halves at offset/limit
read_file("/.../ch_src/ch_03.txt", offset=1, limit=600)
read_file("/.../ch_src/ch_03.txt", offset=601, limit=600)
# Stitch notes from both halves manually

Approach B — split into sub-chapter files (for subagent delegation):

grep -nE "^## [0-9]" ch_src/ch_NN.txt | head -20   # detect section boundaries
# Split at these lines, writing ch_NN_part1.txt, ch_NN_part2.txt, ...

Decision rule = main agent → Approach A (less filesystem pollution); subagent batch → Approach B (subagent context is finite, 100KB chokes it).

Per-call context budget (hard limits):

Agent Per-call ceiling Why
Main agent (serial) ≤ 100 KB PDF text per read_file Fits comfortably; leaves headroom for the note itself
Subagent (delegated) ≤ 60 KB per call Subagent has fixed tool overhead; trim more aggressively
Hard upper bound (any agent) 200 KB per call Above this, token cost + truncation risk both spike; never exceed without explicit user approval

2.1.2 Chapter Note Title Format (Mandatory)

Every Chapter-NN.md file must begin with a single H1 title in the format below. The title is the agent contract for book_summary.md's chapter index — keep the two in sync.

Book language Title format Example
English-original (Murray, Humphrey, Keener, …) 第NN章:中文标题(English Original Title Verbatim) # 第 1 章 连续单种群模型(Continuous Population Models for Single Species)
Chinese-original (李航《统计学习方法》 etc.) 第NN章:中文标题 — no parenthetical # 第1章 统计学习及监督学习概论
Translated book (e.g. 中译 Murray) treat as English-original; parenthetical uses the source English title, not the translation's own English subtitle (same as English-original row)

Hard rules:

  • 第NN章 matches the human-language style the rest of the file uses — if existing files say "第 1 章" (with full-width spaces), use "第 1 章"; if "第1章" (no spaces), use "第1章". Do not mix. Do not zero-pad — the filename Chapter-NN.md already carries the canonical number.
  • (英文标题) is verbatim from the book's own chapter heading (TOC or chapter opener). Do not paraphrase, lowercase, or strip articles. Never invent one. If the book is Chinese-original, omit the parenthetical entirely — no (), no repeated Chinese.
  • Separator: full-width colon (U+FF1A), full-width parens () (U+FF08/FF09). Do not mix with half-width (): in the same heading.
  • The H1 is the only heading affected. Filename stays Chapter-NN.md (§5.6 unaffected). ## subsections start at H2 and are unaffected.

Rationale (skip on re-read): the H1 is what mkdocs-material renders first on the public site and what grep picks up in cross-book searches. A consistent 第XX章:中文(English) shape makes chapter indices sortable and lets a reader answer "what is this chapter in the source book?" in one glance without clicking through.

2.2 Note-Taking Structure (Minimum 2000 Words per Chapter)

Every chapter note must contain these seven sections. The word-count floor (2000 中文字符) is the §5.0 mechanical check; per-section word budgets below are guidance for hitting it.

§ Section (Chinese → English) Word budget What goes here
1 ## 作者 (Author) n/a The chapter's author(s), affiliation, role in the book
2 ## 内容概述 (Chapter Overview) 300–500 字 Problem addressed, main result, fit in the book, prerequisites
3 ## 核心方程与概念 (Main Formulas and Derivations) 500–1000 字 (LONGEST) Key equations (LaTeX + variable defs + physical meaning), key derivations, flag empirical (E) vs. theoretical (T)
4 ## 关键结论 (Key Conclusions) 300–500 字 Core findings, experiment/observation comparisons, what the chapter establishes vs. suggests
5 ## 挑战和开放性问题 (Challenges and Open Questions) 300–500 字 Gaps in theory, missing experiments, unresolved debates, math/comp unsolved problems
6 ## 个人反思与批判性分析 (Personal Reflections) 300–800 字 Compare author's modeling philosophy, what the math simplification gains/loses, what to ask the author, which derivations to reproduce
7 ## 重要参考文献 (References) ≥ 5 refs [X1][XN] in order of first appearance in body; full citation incl. DOI

§3 must include inline LaTeX ($...$ or $$...$$) for every key equation. The reference list in §7 must use the [XN] numbering in order of first appearance in the body, not alphabetized. Cross-chapter references are renumbered locally in each chapter (chapter notes are standalone).

Optional §4.5 insertion for computational chapters: for chapters that introduce a numerical method (FEM, spectral, boundary integral, etc.), add ## 关键算法或建模方法 between §4 and §5, covering: the method, the computational pipeline, key parameter choices + physical justification, and computational cost / convergence notes. This brings the chapter to 8 sections — the §5.0 check script uses an "all 7 mandatory + 1 optional" pattern: it only fails if any of the 7 mandatory sections is missing. The 8th section is encouraged but not enforced.

2.3 Formula Placement (Optional Summary Table)

Default policy: place all key equations inline within §3 (Main Formulas and Derivations) using Markdown LaTeX ($...$ or $$...$$). This is sufficient for most chapters.

Use an explicit formula summary table at the end of the chapter only when the chapter has many equations (> 5) and they are repeatedly referenced across sections. In that case, follow the template below:

| # | Name | Equation | Physical Meaning |
|---|------|----------|-----------------|
| (3.1) | Rate equation | $\dot{m} = \alpha m$ | Exponential growth |
| ... | ... | ... | ... |

Skip the summary table when: - The chapter has ≤ 3 key equations - All equations are short inline expressions (e.g., $\Omega = 1$) - Equations appear only once and don't need cross-referencing

Rules when the table is used: - Number equations as chapter.section.sequential (e.g., 3.2.1) - Define all variables on first appearance (in §3, not in the table) - Mark empirical equations with (E) and theoretical with (T) - The table is a navigation aid, not a substitute for the §3 discussion


Phase 3 — Book-Level Synthesis

3.0 Book Summary is MANDATORY

book_summary.md is a mandatory output (see Phase 0.4). The only exception is if the user explicitly says at the start "no summary needed" — in that case, confirm and document the decision in README.md.

The summary is what makes the per-chapter notes discoverable by future-you and other readers. Without it, the 10–20 chapter files become an unorganized pile.

3.1 Book Summary (3000–5000 words)

After all chapters are read, write a book summary covering:

  • Book's main thesis and scope
  • Structure and organization rationale
  • Core theoretical framework (the central equation or idea that unifies the book)
  • Key contributions (what this book adds beyond existing literature)
  • Strengths and weaknesses (honest critical assessment)
  • Target readership (who should read this book and who should not)
  • Comparison to competing books (if any)
  • Overall rating and recommendation

3.2 Cross-Chapter Connections

Create a map of how chapters relate to each other: - Which chapters build on previous ones vs. stand independently? - What is the thread/argument that connects them? - Are there contradictions or disagreements between chapters?

3.3 Terminology Glossary

Extract and organize key terms: - English term → Chinese translation → Definition - Group by theme or chapter - Note any non-standard definitions or author-specific usage


Phase 4 — Code and Implementation (Optional)

If the book includes computational methods:

4.1 Reproduce Key Algorithms

  • Implement central algorithms from scratch in Python/Matlab
  • Test against simple analytical solutions (when available)
  • Compare performance against reference implementations

4.2 Build a Reference Library

# reference_implementations/
#   ├── ch5_elastic_rod.py
#   ├── ch11_neohookean.py
#   ├── ch15_cavitation.py
#   └── ...

4.3 Numerical Validation Checklist

  • [ ] Convergence under mesh refinement
  • [ ] Invariants conserved (energy, momentum, mass)
  • [ ] Known analytical limits recovered
  • [ ] Physical dimension analysis (units check)

Phase 5 — Quality Checklist

Before finalizing, run a mechanical self-check on every chapter file. Pass/fail must be deterministic (programmatic grep/wc), not subjective.

5.0 Mechanical Self-Check (single source of truth)

Run these once per chapter and once for the whole book. Each row is also the pass/fail gate — no separate §5.1 table needed:

# Check Pass criterion Fail → action
1 Word count (per chapter) len(re.findall(r'[\u4e00-\u9fff]', text)) ≥ 2000 Expand §3 (formulas) or §6 (reflection)
2 All 7 sections present grep for ## 作者, ## 内容概述, ## 核心方程与概念, ## 关键结论, ## 挑战和开放性问题, ## 个人反思与批判性分析, ## 重要参考文献 — all must match Add the missing ## heading and content
3 References (per chapter) grep -cE '^\[X[0-9]+\]' ≥ 5 Re-scan chapter + bibliography section; add missing
4 Equations (per chapter) grep -cE '\$[^$]+\$' ≥ 1 in §3 Convert plain-text math (x^2) to LaTeX ($x^2$)
5 Book summary book_summary.md exists at the book root Write it (this is the top-level navigation)

A chapter "passes" only if all five return green. If any check fails, do not move on — fix the chapter and re-run.

Drop-in check script (one shot, all five checks):

BOOK=~/Documents/Reading/[Book]

# Check 1: word count
for f in $BOOK/Chapter-*.md; do
  cn=$(python3 -c "import re,sys; t=open(sys.argv[1]).read(); print(len(re.findall(r'[\u4e00-\u9fff]', t)))" "$f")
  echo "$f: $cn 中文字符"
done

# Check 2: all 7 sections
for f in $BOOK/Chapter-*.md; do
  missing=""
  for s in "作者" "内容概述" "核心方程与概念" "关键结论" "挑战和开放性问题" "个人反思与批判性分析" "重要参考文献"; do
    grep -q "## $s\|## .* $s" "$f" || missing="$missing [$s]"
  done
  [ -n "$missing" ] && echo "$f MISSING:$missing" || echo "$f: 7/7 sections OK"
done

# Check 3: references
for f in $BOOK/Chapter-*.md; do
  refs=$(grep -cE '^\[X[0-9]+\]' "$f")
  echo "$f: $refs references"
done

# Check 4: equations (LaTeX expressions)
for f in $BOOK/Chapter-*.md; do
  eqs=$(grep -cE '\$[^$]+\$|\\$\\$' "$f")
  echo "$f: $eqs LaTeX expressions"
done

# Check 5: book summary
[ -f $BOOK/book_summary.md ] && echo "book_summary.md: OK" || echo "MISSING book_summary.md"

5.2 Subjective Quality Gates (Apply Manually After Mechanical Pass)

These are the non-mechanical judgement calls — use them only after §5.0 returns green:

  • [ ] Self-containment: a reader who has not read the book can follow the chapter note (no orphan references to "as mentioned in Section 4.2 of the book")
  • [ ] Equation variables: every symbol in a LaTeX equation is defined in the surrounding text on first use
  • [ ] Reference discoverability: every [XN] in the body has a matching entry in §7 (and vice versa)
  • [ ] Critical analysis substance: §6 (reflection) makes at least one argument that the book itself does not make — not just "well written" or "clear derivation"
  • [ ] Consistency: chapter notes use the same notation for the same quantity across chapters (e.g., \(\alpha\) is always the fractional order, not redefined in Ch 4)

5.3 Format Rules

  • [ ] Chinese punctuation throughout (,。:;?!""'')
  • [ ] Markdown LaTeX format ($...$ or $$...$$) for any mathematical formulas — not mandatory to include math, but when present it must use LaTeX notation
  • [ ] Consistent heading hierarchy (######, no skipped levels)
  • [ ] No character encoding issues (verify with file -i *.md returning utf-8)

5.4 Python Equation Snippet Policy (A6)

When a formula is mechanically useful (i.e., you'd run it as code to verify a result), include a fenced Python block alongside the LaTeX:

- **关键方程 (3.1)**:Laplacian with fractional power
  $$\Delta^{\alpha/2} u(x) = \mathcal{F}^{-1}\left[|k|^{\alpha} \hat{u}(k)\right](x)$$

  ```python
  import numpy as np
  def frac_laplacian_fft(u, dx, alpha):
      k = np.fft.fftfreq(len(u), d=dx) * 2*np.pi
      u_hat = np.fft.fft(u)
      return np.real(np.fft.ifft(np.abs(k)**alpha * u_hat))
  ```

Rules: - Use Python snippets only when the equation has a runnable verification path - Default to LaTeX-only for derivations, identities, and asymptotic forms - Always include the symbol table in the surrounding prose; do not let the code block carry the math alone

5.5 Reference Count Standardization (B8)

  • Per chapter: ≥ 5 references, numbered [X1], [X2], ... in order of first appearance in the body
  • Per book: 30–80 references across all chapters + the book summary's own reference section
  • Format: [XN] Author(s). Title. Journal Year;Vol(Issue):Pages. DOI (journal article); [XN] Author(s). Title. Publisher, Year. ISBN (book)
  • Cross-chapter sharing: if a reference appears in multiple chapters, renumber it in each chapter's local list (do not maintain a global numbering) — chapter notes are standalone

5.6 Technical / Output Structure

  • [ ] Files named consistently: Chapter-XX-[short-title].md (use 01-, 02-, ... for ordering)
  • [ ] All chapter files saved under ~/Documents/Reading/[Book-Title]/
  • [ ] Book summary saved as book_summary.md at the book root
  • [ ] One README.md at the book root listing all chapters and their one-line summaries

Phase 6 — Execution Mode (Main Agent vs Subagents)

By default, the main agent reads every chapter and writes every note itself — no delegate_task calls. This is the preferred mode for this workflow. Subagents are an optional escape hatch for narrow cases (see §6.4).

6.1 Default: Main-Agent Serial Execution

The main agent loops over chapters, reading the chapter text and writing the corresponding Chapter-XX.md directly. This works because:

  • Reading + note-taking is a reasoning-heavy, sequential task — the main agent retains context from earlier chapters (notation, cross-references, terminology consistency).
  • Notes are self-contained by design (§5.2 self-containment check), so a later chapter does not need an earlier chapter's full text to be written.
  • Delegation adds handoff overhead (context packaging, output verification, retry on subagent errors) that often exceeds the time saved.

Expected throughput: ~8–12 chapters per main-agent session, depending on chapter length. For books with more than 15 chapters, see §6.5 (long-book strategy).

6.2 When the User Explicitly Says "main agent only"

If the user says things like "全部在主任务中进行,不要触发子任务" or "由你(主 agent)亲自", honor it absolutely:

  • Do not call delegate_task, even for "easy" sub-tasks like slicing the PDF or running OCR on figures.
  • Do not call cronjob for incremental progress — the user wants a single, focused, in-context session.
  • Inline everything: terminal() for the slicing/OCR, write_file() for each chapter, read_file() for cross-checking.

This is the default for this user unless the user explicitly opts in to delegation.

6.3 In-Loop Workflow (Main Agent)

Per chapter, the main agent does:

1. read_file(chapter_XX.txt)             # full text, no truncation
2. (optional) read_file(book_toc.json)   # for cross-chapter reference checks
3. write_file(Chapter-XX.md, <note>)     # one-shot write of the full note
4. terminal() to run §5.0 self-check     # verify word count, sections, refs
5. if any check fails → patch() and re-run §5.0

No batching: write each chapter, verify, then move on. Do not write 5 chapters in parallel and verify at the end — that defeats the self-check.

6.4 Optional: When Subagent Delegation IS Appropriate

Delegate only when the user explicitly opts in, AND at least one of the following is true:

  • The book has > 30 chapters and the user has agreed to delegation
  • A single chapter text is > 200 KB and the user is OK with a subagent
  • A clearly independent sub-task exists (e.g., "build the reference library index from the existing 12 chapter notes") that the main agent can verify

If you do delegate, use the subagent task template in Appendix B (do not weaken it to "≥ 2000 字符" — the template explicitly tells the subagent to fill each of the 7 sections with concrete mathematical/conceptual content, and the word count follows naturally). The full template (~60 lines) lives in Appendix B to keep this section readable.

Context to pass: book metadata (title, author, year, publisher), output file path, chapter-specific requirements, language (Chinese), notation (LaTeX).

Post-delegation verification (mandatory, do not skip):

wc -m ~/Documents/Reading/[Book]/Chapter-XX.md   # check character counts
grep -c "Section [1-7]" Chapter-XX.md           # verify all 7 sections present
# Plus the full §5.0 self-check, run by the main agent

Treat subagent output as a draft, not a finished product. The main agent must read back the chapter note, fix any quality issues, and re-run §5.0 before accepting it.

6.5 Long-Book Strategy (> 15 chapters)

When the book exceeds what the main agent can do in one session (~12 chapters):

  1. Read first 8–10 chapters in this session, stopping before context becomes uncomfortable
  2. Save progress in a progress.md file in the book folder: "Done: Ch 1–8. Next: Ch 9 (title). Pending: Ch 10–N."
  3. Do NOT auto-schedule the next session via cronjob — let the user initiate the next session with "继续"
  4. In the next session, the user says "读取 [previous-session-id],并继续" → the new main agent reads progress.md and resumes

This is the human-in-the-loop checkpoint that prevents the agent from drifting off-policy during long, unattended runs.

Lifecycle note (revised 2026-06-02): progress.md is a short-lived workflow log. Once the book is complete and Phase 8 has synced everything to the public site, progress.md is auto-removed by Phase 8 §8.4. If you need long-term retention (e.g., for an audit trail or future re-reads), move progress.md to a personal archive (e.g., ~/Documents/Reading/_archive/) before the sync step. The progress.md content itself is not part of the public site.


Phase 7 — Review and Update

After completing the book:

  1. Read your own notes 1 month later — what is unclear?
  2. Update notes with insights from later chapters or other papers
  3. Add a "Further Reading" section to each chapter with related papers you have since discovered
  4. Cross-link notes: e.g., "cf. Chapter 12, Section 3.2 for the 3D version of this result"
  5. Revise instruction.md with any lessons learned from this specific book

Phase 8 — Sync to Public Site (auto-triggered, revised 2026-06-02)

After every Chapter-XX.md is finalized, the main agent automatically syncs the work to the public reading-notes-site repository and triggers a Cloudflare Pages deployment. This phase is fully automatic — the user does not need to say "sync" or "deploy". It runs after the chapter write + §5.0 self-check both pass.

8.1 When This Triggers

  • Per chapter: after a Chapter-NN.md is written and passes §5.0 mechanical self-check, Phase 8 fires once for that chapter
  • Final sync: after book_summary.md is written (and passes §3.1 length check), Phase 8 fires one last time with the summary included
  • The user does not need to issue any command — Phase 8 is part of the write loop

8.2 Sync Step Template (copy-pasteable)

For each chapter write, the main agent executes:

Sync etiquette (applies to every step below):

  • No silent retries. If a command fails, report the failure in plain text and pause. Do not retry without user awareness. Max 2 retries on the same failure type; after that, stop and ask.
  • No credential refresh. If a git push or API call fails with auth error, report and stop. The user manages tokens/SSH keys.
  • No auto-edit of user content. If a markdown file (chapter note, book summary) is broken, report which file + the warning. Do not rewrite it. Agent fixes workflow, not content.
# 1. Sync source files to site (cp -p preserves mtime; md5sum diff first)
SRC=~/Documents/Reading/<book-slug>
DST=~/Documents/reading-notes-site/docs/ReadingNotes/<book-slug>   # note: site uses simpler slug if it differs
# (Use site-slug mapping from §8.4 if Reading/ and site/ folders don't match)

for f in Chapter-NN.md; do
  src_md5=$(md5sum "$SRC/$f" | cut -d' ' -f1)
  dst_md5=$(md5sum "$DST/$f" 2>/dev/null | cut -d' ' -f1)
  if [ "$src_md5" != "$dst_md5" ]; then
    cp -p "$SRC/$f" "$DST/$f"
    echo "synced $f"
  fi
done

# (generate_pages.py is NOT run here — it runs in the Cloudflare Pages
#  build step, see reading-notes-site/.github/workflows/deploy.yml
#  "Generate pages" step. Running it twice would race the agent's local
#  build artifacts against the cloud build and produce stale index pages.)

# 2. (Optional but recommended for first sync only) verify mkdocs build
#    Skip for incremental per-chapter syncs — GitHub Actions catches broken markdown on push
# mkdocs build

# 3. Commit — ONLY track the source of truth (docs/ReadingNotes/) and the
#    auto-generated home page table (docs/index.md). NEVER add docs/books/ or
#    site/ — both are 100% regenerated by Cloudflare Pages on every push and
#    are listed in reading-notes-site/.gitignore. Adding them by accident
#    bloats every commit with 5+ MB of generated markdown. See §8.9.
cd ~/Documents/reading-notes-site
git add docs/ReadingNotes/ docs/index.md
git status   # sanity check: confirm no docs/books/ files are staged
git commit -m "FRDE: sync Chapter-NN.md"  # or sync book_summary.md for final sync

# 4. Push (triggers GitHub Actions → generate_pages.py → mkdocs build → CF Pages deploy)
git push

Per-chapter syncs are small commits (a few KB). Final sync includes book_summary.md (~25KB).

8.3 Cleanup & Sync (combined)

Principle: intermediate files (ch_src/, progress.md, /tmp scratch) are moved to .trash_<name>/ directories before the sync commit. The source folder then contains only the final product (Chapter-NN.md, book_summary.md, README.md, user scripts), and git add . is safe — no whitelist needed.

Why mv to .trash_* (not rm): the Hermes sandbox blocks rm outright (see ~/.hermes/memory for the 2026-06-06 cell-migration incident). mv to a .trash_* directory is the only way to satisfy "清除中间文件" directives without triggering the sandbox. The .trash_* directories are git-ignored (see .gitignore line **/.trash_*/).

Steps per chapter sync:

SRC=~/Documents/Reading/<book-slug>
SITE=~/Documents/reading-notes-site/docs/ReadingNotes/<book-slug>

# 1. Move intermediates to trash (one round, no prompts)
[ -d "$SRC/ch_src" ] && mv "$SRC/ch_src" "$SRC/.trash_ch_src_$(date +%s)"
[ -f "$SRC/progress.md" ] && mv "$SRC/progress.md" "$SRC/.trash_progress_$(date +%s)"
rm -f /tmp/<book-slug>*.txt /tmp/cover.png 2>/dev/null   # /tmp is ephemeral, rm is safe here

# 2. Sync the now-clean source to the site
for f in Chapter-NN.md README.md book_summary.md; do
  [ -f "$SRC/$f" ] || continue
  src_md5=$(md5sum "$SRC/$f" | cut -d' ' -f1)
  dst_md5=$(md5sum "$SITE/$f" 2>/dev/null | cut -d' ' -f1)
  [ "$src_md5" != "$dst_md5" ] && cp -p "$SRC/$f" "$SITE/$f" && echo "synced $f"
done

# 3. Single commit covers both source-mirror + trash moves
cd ~/Documents/Reading
git add -A
git status   # confirm: only Chapter-NN.md, book_summary.md, README.md, .trash_* moves
git commit -m "sync: Chapter-NN.md (and cleanup intermediates)"

# 4. Push to the site repo (separate commit on site side, see §8.2 step 4 for that flow)

What is preserved (never touched by cleanup): Chapter-NN.md, book_summary.md, README.md, user scripts (*.py not matching the trash patterns), .bak/, *-bak/. These are the only files that should remain in the book folder after a sync.

Recovery: .trash_* directories are kept locally and git-ignored — they can be re-promoted (mv .trash_ch_src_xxx ch_src) if a sync was interrupted and the chapter files need to be re-splitter. Periodic manual cleanup of .trash_* directories is the user's call (the agent never deletes them).

Sync log: if progress.md was moved to trash, the sync timestamps are captured in the git commit message itself (no separate ## Sync Log section needed).

Deployment verification (do not do this): after git push returns success, the main agent's job is done. GitHub Actions runs the deploy step; CF Pages reports failures via email/webhook. The agent must not poll the deployment, run mkdocs build locally to "verify", or check the live site URL — these add no value and risk false alarms.

8.4 docs/books/ and site/ are Cloudflare-Generated (revised 2026-06-06)

Rule: the main agent must never commit anything under reading-notes-site/docs/books/ or reading-notes-site/site/. Both are 100% regenerated by Cloudflare Pages on every push (workflow: python3 generate_pages.pymkdocs buildcloudflare/pages-action@v1). The cloud build is the single source of truth — whatever is in the repo gets overwritten on every push.

What this means for the agent:

  • Do not run python3 generate_pages.py locally during sync — it races the cloud build and creates stale local artifacts that leak into the next commit if forgotten.
  • Do not hand-edit docs/books/anything.md — the cloud overwrites it within minutes.
  • Do not git add docs/ or git add docs/books/ — use git add docs/ReadingNotes/ docs/index.md only (see §8.2 step 3).

What's already enforced: reading-notes-site/.gitignore lists both site/ and docs/books/. git status will show docs/books/ files as "untracked, ignored" if you somehow add them — a red flag that should trigger immediate git reset.

Recovery recipes:

# If git status shows docs/books/ files staged:
cd ~/Documents/reading-notes-site
git reset HEAD docs/books/        # unstage them
git status                        # confirm only docs/ReadingNotes/ and docs/index.md are staged

# If docs/books/ already landed in a commit (do NOT rewrite history — bloats diffs):
cd ~/Documents/reading-notes-site
git rm -r --cached docs/books/    # remove from index, keep local copy
echo "docs/books/" >> .gitignore  # if not already there
git add .gitignore
git commit -m "chore: untrack docs/books/ (let CF Pages regenerate)"

# Local preview still works (files stay untracked):
cd ~/Documents/reading-notes-site
python3 generate_pages.py         # rebuilds docs/books/ locally
mkdocs serve                      # live-reload preview at http://127.0.0.1:8000

Symmetry note: the Reading repo (~/Documents/Reading/) does not have this issue — its Chapter-NN.md and book_summary.md are hand-written source files and are the source of truth. The two repos play distinct roles:

Repo What's hand-written What's generated What to commit
reading-notes/ (source) Chapter-NN.md, book_summary.md, README.md, progress.md nothing everything in the book folder except intermediates (§8.4)
reading-notes-site/ (publish) nothing docs/books/* (by generate_pages.py), site/* (by mkdocs build), docs/index.md rows (by generate_pages.py) docs/ReadingNotes/*, hand-curated docs/index.md outside the <!-- AUTO-BOOKS-* --> markers, and the workflow files (mkdocs.yml, generate_pages.py, .github/workflows/)

Appendix A — Formula Collection Template

## 公式汇总

| # | 名称 | 形式 | 物理意义 | 类型 |
|---|------|------|----------|------|
| (X.1) | | | | (T)/(E) |
| (X.2) | | | | (T)/(E) |

注:(T)=理论推导,(E)=经验公式

Appendix B — Subagent Task Description Template (for §6.4)

Pass this template verbatim to the subagent. Do not weaken it to "≥ 2000 字符"; the subagent should be told to fill each of the 7 sections with concrete mathematical / conceptual content from the chapter, and the word count will follow naturally.

Read the chapter text from /path/to/chapter_XX.txt
(this is Chapter X of "[Book Title]" by [Author], [Year])

Your job: produce a self-contained Chinese reading note (saved to the path below)
that an external reader (who has NOT read the book) can follow end-to-end.

Target structure (all 7 sections MANDATORY, each filled with concrete content
from the chapter — do NOT pad with generic phrases):

  §1 作者
     - The actual author(s) of this chapter
     - Their affiliation if mentioned
     - The chapter's role in the book (e.g., "this is a methods chapter that
       introduces the variational framework used throughout the rest of the book")

  §2 内容概述  (300-500 Chinese characters)
     - What problem this chapter addresses
     - What the main result or message is
     - Where it fits in the book's overall argument
     - Prerequisites a reader needs to follow this chapter

  §3 核心方程与概念  (500-1000 Chinese characters; LONGEST section)
     - For EVERY key equation: write the LaTeX, define every variable, and
       explain the physical/mathematical meaning in 2-3 sentences
     - For EVERY major concept: define it and give at least one concrete
       example from the chapter
     - Show the key steps of derivations (not algebraic drudgery)
     - Mark empirical vs. theoretical equations

  §4 关键结论
     - 3-7 bullet points, each stating ONE specific result with its conditions
     - Each conclusion should be falsifiable (state the regime, parameter range,
       or assumption under which it holds)

  §5 挑战和开放性问题
     - At least 3 items; not just "more research is needed" but specific
       gaps the chapter itself flags, plus gaps you notice

  §6 个人反思与批判性分析
     - At least one observation that goes BEYOND what the chapter says
     - E.g., connection to other chapters, a limitation the author didn't
       acknowledge, a method that would improve the result
     - Avoid generic praise ("well-written", "clear derivation")

  §7 重要参考文献  (≥ 5 references, in order of first appearance [X1]…[XN])
     - Full citation with DOI when available
     - One line per reference; alphabetize is NOT required; order = appearance order

Mechanical checks (run yourself before saving):
  - Word count ≥ 2000 Chinese characters (use the same Python regex as §5.0)
  - All 7 sections present
  - At least 5 [XN] references
  - At least 1 inline LaTeX per subsection in §3

Save to: ~/Documents/Reading/[Book]/Chapter-XX.md

Do NOT:
  - Use placeholder text like "..." or "TODO"
  - Skip §5 or §6 because the chapter "doesn't have" challenges/reflections
    (every chapter has both; find them or infer them from the text)
  - Exceed 100 KB output (the file should be ≈ 15-30 KB)

Appendix C — (removed)

Chapter splitter scripts were moved to the book-chapter-splitter skill (see §1.3). This appendix is kept as a sentinel — do not reintroduce inline splitter code; update the skill instead.

Appendix D — Book Metadata Template

- **书名**- **作者**- **出版社**- **出版年份**- **ISBN**- **DOI**- **核心主题**- **目标读者**- **前置知识**- **相关书籍**

This SOP is a living document. Revise after each book based on what worked and what did not.