Scientific Book Reading — Standard Operating Procedure

Purpose: General guide for reading scientific/technical books and producing structured notes.
Scope: Applicable to any technical monograph, textbook, or edited volume in computational mechanics, applied mathematics, biology, physics, or engineering.
Output: Per-chapter notes + book summary + this reusable instruction.

Phase 0 — Before You Start

0.1 Acquire the Book

Confirm the exact title, author(s), edition, ISBN, and publication year.
Check whether a DOI or open-access PDF is available — prefer the publisher's PDF for layout integrity.
Note any companion resources (solutions manual, code repository, dataset).

0.2 Assess the Book

Answer these questions before reading: - Who is the intended audience? (grad students / researchers / practitioners) - What prerequisites does it assume? - Is it a monograph (single author's research perspective) or a multi-author edited volume? - What is the book's main thesis or unifying theme? - How is it structured? (by topic, by dimensionality, by method, by application?) - What is the expected outcome of your reading? (survey for lit review, deep-dive for implementation, teaching preparation, etc.)

0.3 Set Up the Workspace

Folder naming convention (strict):

[Year]-[Book-Title]-[Author]

Year: 4-digit publication year
Book-Title: title with spaces replaced by - (hyphens)
Author: surname of the first author only (no co-authors)
Example: 2017-The-Mathematics-and-Mechanics-of-Biological-Growth-Goriely

Full-title rule (revised 2026-06-02): use the complete book title in the folder name. Do not abbreviate to 3-word or any other short form. The example above (...Biological-Growth-Goriely, 9 words) is the correct form — do not shorten it. This rule applies to all new folders; do not retroactively rename existing folders (preserves site URLs).

~/Documents/Reading/
└── [Year]-[Book-Title]-[Author]/
    ├── book_summary.md          # 全书概括（必做，详见 Phase 3）
    ├── instruction.md           # copy of this SOP
    ├── Chapter-01.md            # per-chapter notes
    ├── Chapter-02.md
    └── ...

0.4 Mandatory vs Optional Outputs

Output	Mandatory?	When to skip
`Chapter-NN.md`（每章笔记）	必做	无——所有章都要写
`book_summary.md`（全书概括）	必做	仅有极个别例外：用户明确说"不需要 summary"
`reference_implementations/`（代码复现）	不做	除非用户在 Phase 0 明确说"要复现某算法"——本工作流默认不建
公式汇总表	可选，按需	若章内公式 ≤ 3 个或全是 inline 简短公式，不强求汇总表；公式多且相互引用频繁时才用
`ch_src/ch_NN.txt`（章节文本切分）	临时必做	用于复现性验证 + 写笔记时引用——但同步完成后会被 Phase 8 清理，不需要长期保留

0.5 Check for Existing Work (before starting)

In case the book has been read before (or another agent attempted it), check git history first:

# In the source repo (~/Documents/Reading)
git log --oneline --all | grep -i "<book-slug>"
# In the site repo (~/Documents/reading-notes-site)
cd ~/Documents/reading-notes-site && git log --oneline | grep -i "<book-slug>"

If a prior version exists, decide explicitly with the user: - A：丢弃旧版，主 agent 重读（推荐——保证质量门通过） - B：用旧版做基础，只补全/修订 - C：保留旧版作历史存档，新写到新文件夹（适合"两版本对比"研究）

Also detect name collisions:

git ls-files | grep -i "<book-slug>"

0.6 Agent Strategy Decision Gate (mandatory before Phase 1)

Before doing anything that involves the main agent's own reading vs. delegate_task, explicitly confirm the strategy with the user. Do not assume. Use this script:

Question to the user (CLI → use check_list.md checkbox, NOT clarify buttons):

  (a) 全部主 agent 亲自做，不触发子任务  ← default for this user
  (b) 哪些章节可以委托 subagent？       ← e.g., "Ch 5-10 can be delegated"
  (c) 全部 subagent 并行做

Default assumption for this user: (a) main agent does everything. The user typically says this explicitly ("全部在主任务中进行，不要触发子任务"). If the user does not say anything about delegation, still ask once before Phase 1 begins — but the default in the absence of a reply is (a).

Record the decision in progress.md at the book root:

## Execution Mode
- Mode: (a) main agent serial | (b) mixed | (c) full subagent
- Decided by: [user message or default]
- Notes: [...]

This is the only "ask the user first" decision in the whole SOP. Everything else (chapter structure, format, formulas) is decided by §2.2 / §3.0 / §5.

If multiple folders match (e.g., a -bak and a current), clarify with the user before writing anything.

0.7 Auto-detect PDF Metadata and Create Folder

When the user gives a PDF path, the main agent automatically extracts title, author(s), and year, then proposes a folder name per §0.3. Do not require the user to type the metadata manually.

0.7.1 When This Triggers

The user says "read this PDF" / "read " and provides a path
The user says "I have a new book at "
Any situation where the input is a PDF and no existing folder is found (§0.5)

0.7.2 Extract Metadata (3-step)

# Step 1: pdfinfo for metadata fallback
pdfinfo "$pdf" | grep -E "^(Title|Author|Subject|Keywords|CreationDate)"

# Step 2: pymupdf for first 2 pages of text (most reliable for academic PDFs)
python3 << 'EOF'
import pymupdf
doc = pymupdf.open("$pdf")
n = min(2, len(doc))
for i in range(n):
    print(f"--- page {i+1} ---")
    print(doc[i].get_text()[:2000])
EOF

0.7.3 Main Agent Parsing Rules

The main agent reads the page-1 (and page-2) text and applies:

Title: the line(s) with the largest font size in the first 1-3 lines, excluding ISBNs, journal names, conference names, page numbers. If multi-line (e.g., wrapped title), concatenate with single space.
First author: the first name in the author list (before any "and" or "," or "&"). If only an editor is listed, use the editor's surname.
Year: the publication year from the title page or copyright page. If multiple years are present (e.g., "© 2024 Springer"), use the copyright year.
Publisher / DOI: optional, do not need to extract.

0.7.4 Confirm with User (one round)

Before creating the folder, print a confirmation line and let the user correct any misdetection:

Detected:
  Title: Fractional Dispersive Models and Applications
  Author: Kevrekidis, Cuevas-Maraver (eds.)
  Year: 2024
Proposed folder: 2024-Fractional-Dispersive-Models-and-Applications-Kevrekidis
Confirm? (y / edit title / edit author / cancel)

The user can correct in one line. The main agent then creates the folder:

mkdir -p ~/Documents/Reading/<slug>

PDF location policy: the PDF is not moved into the new folder. Keep it at the original location (e.g., ~/Documents/Books/) so the user has a single canonical source. The folder under ~/Documents/Reading/ contains only notes + intermediate files (which Phase 8 will clean up).

0.7.5 Fallback: When Title Detection Fails

If pdfinfo Title is empty AND the first-2-pages text has no recognizable title (e.g., the PDF is a scanned book with no text layer, or the title page is a cover image only):

Try visual detection: use vision_analyze on a rendered cover image:
```
python3 -c "import pymupdf; doc=pymupdf.open('$pdf'); pix=doc[0].get_pixmap(dpi=150); pix.save('/tmp/cover.png')"
```
Then vision_analyze('/tmp/cover.png', "What is the title and author of this book?").
If still unclear, ask the user directly (one short prompt, not a check_list):
```
Cannot auto-detect title. Please provide:
Title: ...
Author: ...
Year: ...
```

This is the only case where the main agent asks the user for metadata — the rest is automated.

0.7.6 Edge Cases

Multi-author book (editors): use first editor's surname (e.g., Kevrekidis for Fractional Dispersive Models and Applications)
Single-word title (e.g., Causality): use the full single word — no padding with stopwords
Subtitle with colon (e.g., Deep Learning: A Practitioner's Approach): drop the colon, concatenate: ...Approach-...
Edition number (e.g., Statistics, 4th ed.): drop the edition tag

Phase 1 — Structural Survey

1.1 Extract Text from PDF

Choose extraction method by priority:

Method	Command	When to Use
pdftotext (layout)	`pdftotext -layout book.pdf -`	Most PDFs; preserves columns
PyMuPDF	`import pymupdf; doc.open()`	Encrypted or non-standard PDF
marker-pdf	`marker_single book.pdf --output_dir out/`	Scanned documents; OCR needed

Verify extraction quality:

pdftotext -layout book.pdf - | head -50   # check page 1
pdftotext -layout book.pdf - | wc -l       # line count
pdftotext -layout book.pdf - | wc -w       # word count

1.2 Identify Chapter Boundaries

pdftotext book.pdf - | grep -n "^Chapter \|^[0-9]\+\.[0-9] "

For multi-author edited volumes, look for section markers instead of chapter markers.
Record the line numbers of each chapter/section heading.

1.3 Extract Each Chapter

# extract_chapters.py
import subprocess

result = subprocess.run(
    ['pdftotext', '-layout', 'book.pdf', '-'],
    capture_output=True, text=True
)
lines = result.stdout.split('\n')

chapter_boundaries = {
    1: (start_line, end_line),
    2: (start_line, end_line),
    # ...
}

for ch, (start, end) in chapter_boundaries.items():
    text = '\n'.join(lines[start:end])
    path = f'chapter_{ch:02d}.txt'
    with open(path, 'w') as f:
        f.write(text)
    print(f"Chapter {ch}: {len(text):,} chars")

1.4 Fallback: When Auto-Split Fails

The extract_book.py split command (or any auto-splitter using keywords + running-header heuristics) will often mis-locate chapters because:

TOC entries (e.g., "Fractional Wave Models and Their Experimental Applications") match against running headers on every page of the actual chapter content
Chapter keywords like "Introduction" or "Conclusion" appear in multiple chapters
The first few attempts of pdftotext -layout may put section numbers on a separate line from the title

When auto-split is wrong (e.g., 8 out of 10 chapters mis-located in a recent run), do NOT keep tuning the heuristic. Switch to byte-offset precision:

import os
from pathlib import Path

text = open("/tmp/book_full.txt").read()
pages = text.split("\x0c")  # form-feed = page boundary

# For each chapter, record the actual page number from pymupdf TOC
chapter_pages = {1: 14, 2: 44, 3: 66, ...}  # from pymupdf.get_toc()

def page_to_offset(p_idx):
    """Find byte offset of the start of page (p_idx+1)."""
    cnt = 0
    pos = 0
    while cnt < p_idx:
        pos = text.find("\x0c", pos) + 1
        cnt += 1
    return pos

chapters = []
for num, page in chapter_pages.items():
    start_byte = page_to_offset(page - 1)
    end_byte   = page_to_offset(chapter_pages.get(num + 1, len(pages)) - 1) \
                 if num + 1 in chapter_pages else len(text)
    body = text[start_byte:end_byte]
    out = Path(f"ch_src/ch_{num:02d}.txt")
    out.write_text(body)
    chapters.append((num, out, len(body)))

This is bulletproof: chapter starts at the exact byte where the page begins, regardless of running-header pollution.

1.5 Verify Each Split File (MANDATORY)

After splitting (auto or manual), always check head + tail of every chapter file:

for f in ch_src/ch_*.txt; do
    echo "=== $f ==="
    head -3 "$f"        # First 3 lines: should be the chapter title + opening prose
    echo "---"
    tail -3 "$f"        # Last 3 lines: should be the chapter's last prose, then references
done

Why head -3 / tail -3 (not -1): a single line can be a header artifact (e.g., page number, running header) and look "correct" by coincidence. Three lines give you enough context to spot: - the first line is a TOC entry like 5. Author Name instead of the actual chapter title - the chapter opens with a leftover page number, e.g. 123, before the real title - the last line is Index or References from a different chapter (off-by-one split) - the chapter ends mid-sentence (truncation from a split gone wrong)

Common failure modes to watch for: - A chapter's first line is N. Author Name (TOC entry) instead of the actual chapter title → running-header pollution (re-split needed) - A chapter's last line is Index or References from a different chapter → off-by-one split (re-check boundaries) - A chapter is suspiciously short (e.g., < 5 KB for a normal book) → truncated (likely missing content)

If any check fails, do not proceed to Phase 2. Go back to §1.4 (auto-split fallback) or manual byte-offset repair.

1.6 Skim Before Reading

Read for each chapter in order: - Abstract / introduction - Section headings and subheadings - First and last paragraph of each major section - Figures, tables, and their captions - Summary / conclusions at end of chapter

This gives you the book's skeleton before committing to deep reading.

Phase 2 — Per-Chapter Deep Reading

2.1 Reading Strategy by Chapter Type

Type	Strategy
Theory/methods chapters	Read with pencil: reproduce key derivations, mark assumptions
Application/chapter chapters	Read for physical insight: what phenomenon, what model, what prediction
Review/survey chapters	Read critically: compare cited works, note consensus vs. debate
Computational chapters	Read with code: pseudo-code → actual implementation

2.1.1 Large Chapter Splitting (>100 KB)

If a single chapter's text file (in ch_src/ch_NN.txt) exceeds 100 KB, do NOT try to read it all at once. Split it into 2-3 sub-reads at section boundaries (look for ^### or ^## heading patterns, or ^Section N style markers).

Two practical approaches:

Approach A — read at offset/limit (the main agent way):

# Read in two halves at offset/limit
# First half:
read_file("/.../ch_src/ch_03.txt", offset=1, limit=600)
# Then:
read_file("/.../ch_src/ch_03.txt", offset=601, limit=600)
# Stitch notes from both halves manually

Approach B — split into sub-chapter files (for subagent delegation):

# Detect major section boundaries
grep -nE "^## [0-9]" ch_src/ch_NN.txt | head -20
# Split at these lines, writing ch_NN_part1.txt, ch_NN_part2.txt, ...

For main agent亲自做 work, Approach A is preferred (less filesystem pollution). For subagent batches, Approach B is required (subagent context is finite, 100KB chokes it).

Context budget rule of thumb:

Agent	Per-call ceiling	Why
Main agent (serial)	≤ 100 KB PDF text per `read_file`	Fits comfortably in main-agent context; leaves headroom for the note itself
Subagent (delegated)	≤ 60 KB per call	Subagent has fixed tool overhead; trim more aggressively to leave room for output writing
Hard upper bound (any agent)	200 KB per call	Above this, token cost + truncation risk both spike; never exceed without explicit user approval

Decision rule when chapter > 100 KB: - Main agent mode (default): split into 2–3 sub-reads at section boundaries (§2.1.1 Approach A) - Subagent mode: split into ≤ 60 KB chunks (§2.1.1 Approach B)

2.2 Note-Taking Structure (Minimum 2000 Words per Chapter)

Every chapter note must contain these seven sections:

Section 1 — Chapter Overview (300–500 words)

What problem does this chapter address?
What is the main result or message?
Where does it fit in the book's overall argument?
What are the prerequisites for this chapter?

Section 2 — Key Problems and Research Motivation (300–500 words)

What are the 3–5 central scientific questions?
What deficiencies in prior work does this chapter address?
Why are new mathematical or mechanical tools needed?
What is the broader significance (biological, medical, engineering)?

Section 3 — Main Formulas and Derivations (500–1000 words)

State the key equations with named variables
Explain the physical meaning of each term
State assumptions and conditions of validity
Show the main steps of derivations (key steps only, not algebraic drudge work)
Note typical parameter values or ranges where given
Flag equations that are empirical vs. theoretically derived

Section 4 — Key Algorithms or Modeling Methods (300–800 words)

What numerical method is used (FEM, spectral, boundary integral, etc.)?
Outline the computational pipeline
Note key parameter choices and their physical justification
Comment on computational cost and convergence behavior

Section 5 — Main Conclusions (300–500 words)

What are the core physical findings?
How do predictions compare to experiments or observations?
What does this chapter definitively establish vs. suggest?

Section 6 — Challenges and Open Questions (300–500 words)

What are the gaps in current theory?
What aspects lack experimental validation?
What contradictions or unresolved debates exist?
Which problems remain mathematically or computationally unsolved?

Section 7 — Personal Reflections and Critical Analysis (300–800 words)

How does the author's modeling philosophy compare to alternatives?
What is gained and lost in the mathematical simplifications?
How might this change your own research approach?
Which derivations are worth reproducing independently?
What would you ask the author if you could?

2.3 Formula Placement (Optional Summary Table)

Default policy: place all key equations inline within §3 (Main Formulas and Derivations) using Markdown LaTeX ( $...$ or $$...$$). This is sufficient for most chapters.

Use an explicit formula summary table at the end of the chapter only when the chapter has many equations (> 5) and they are repeatedly referenced across sections. In that case, follow the template below:

| # | Name | Equation | Physical Meaning |
|---|------|----------|-----------------|
| (3.1) | Rate equation | $\dot{m} = \alpha m$ | Exponential growth |
| ... | ... | ... | ... |

Skip the summary table when: - The chapter has ≤ 3 key equations - All equations are short inline expressions (e.g., $\Omega = 1$ ) - Equations appear only once and don't need cross-referencing

Rules when the table is used: - Number equations as chapter.section.sequential (e.g., 3.2.1) - Define all variables on first appearance (in §3, not in the table) - Mark empirical equations with (E) and theoretical with (T) - The table is a navigation aid, not a substitute for the §3 discussion

Phase 3 — Book-Level Synthesis

3.0 Book Summary is MANDATORY

book_summary.md is a mandatory output (see Phase 0.4). The only exception is if the user explicitly says at the start "no summary needed" — in that case, confirm and document the decision in README.md.

The summary is what makes the per-chapter notes discoverable by future-you and other readers. Without it, the 10–20 chapter files become an unorganized pile.

3.1 Book Summary (3000–5000 words)

After all chapters are read, write a book summary covering:

Book's main thesis and scope
Structure and organization rationale
Core theoretical framework (the central equation or idea that unifies the book)
Key contributions (what this book adds beyond existing literature)
Strengths and weaknesses (honest critical assessment)
Target readership (who should read this book and who should not)
Comparison to competing books (if any)
Overall rating and recommendation

3.2 Cross-Chapter Connections

Create a map of how chapters relate to each other: - Which chapters build on previous ones vs. stand independently? - What is the thread/argument that connects them? - Are there contradictions or disagreements between chapters?

3.3 Terminology Glossary

Extract and organize key terms: - English term → Chinese translation → Definition - Group by theme or chapter - Note any non-standard definitions or author-specific usage

Phase 4 — Code and Implementation (Optional)

If the book includes computational methods:

4.1 Reproduce Key Algorithms

Implement central algorithms from scratch in Python/Matlab
Test against simple analytical solutions (when available)
Compare performance against reference implementations

4.2 Build a Reference Library

# reference_implementations/
#   ├── ch5_elastic_rod.py
#   ├── ch11_neohookean.py
#   ├── ch15_cavitation.py
#   └── ...

4.3 Numerical Validation Checklist

[ ] Convergence under mesh refinement
[ ] Invariants conserved (energy, momentum, mass)
[ ] Known analytical limits recovered
[ ] Physical dimension analysis (units check)

Phase 5 — Quality Checklist

Before finalizing, run a mechanical self-check on every chapter file. Pass/fail must be deterministic (programmatic grep/wc), not subjective.

5.0 Mechanical Self-Check Commands

Run these once per chapter and once for the whole book:

# 1. Word count: ≥ 2000 Chinese characters per chapter
for f in ~/Documents/Reading/[Book]/Chapter-*.md; do
  cn=$(python3 -c "import re,sys; t=open(sys.argv[1]).read(); print(len(re.findall(r'[\u4e00-\u9fff]', t)))" "$f")
  echo "$f: $cn 中文字符"
done

# 2. All 7 sections present (author, overview, formulas, conclusions, challenges, reflection, references)
for f in ~/Documents/Reading/[Book]/Chapter-*.md; do
  missing=""
  for s in "作者" "内容概述" "核心方程与概念" "关键结论" "挑战和开放性问题" "个人反思与批判性分析" "重要参考文献"; do
    grep -q "## $s\|## .* $s" "$f" || missing="$missing [$s]"
  done
  [ -n "$missing" ] && echo "$f MISSING:$missing" || echo "$f: 7/7 sections OK"
done

# 3. References: count and verify ≥ 5
for f in ~/Documents/Reading/[Book]/Chapter-*.md; do
  refs=$(grep -cE '^\[X[0-9]+\]' "$f")
  echo "$f: $refs references"
done

# 4. Equations: every formula has at least one inline LaTeX $...$
for f in ~/Documents/Reading/[Book]/Chapter-*.md; do
  eqs=$(grep -cE '\$[^\$]+\$|\\\$\\\$' "$f")
  echo "$f: $eqs LaTeX expressions"
done

# 5. book_summary.md exists
[ -f ~/Documents/Reading/[Book]/book_summary.md ] && echo "book_summary.md: OK" || echo "MISSING book_summary.md"

A chapter "passes" only if all five checks return green. If any check fails, do not move on — fix the chapter and re-run.

5.1 Pass/Fail Criteria (Per Chapter)

Check	Pass	Fail → Action
Word count	≥ 2000 中文字符	Expand §3 (formulas) or §6 (reflection)
Sections	All 7 present	Add the missing `##` heading and content
References	≥ 5 with `[X1]…[XN]` format	Re-scan chapter and bibliography section; add missing
Equations	≥ 1 inline LaTeX per section in §3	Convert plain-text math (`x^2`) to LaTeX ( $x^2$ )
Book summary	`book_summary.md` exists at the book root	Write it (this is the top-level navigation)

5.2 Subjective Quality Gates (Apply Manually After Mechanical Pass)

These are the non-mechanical judgement calls — use them only after §5.0 returns green:

[ ] Self-containment: a reader who has not read the book can follow the chapter note (no orphan references to "as mentioned in Section 4.2 of the book")
[ ] Equation variables: every symbol in a LaTeX equation is defined in the surrounding text on first use
[ ] Reference discoverability: every [XN] in the body has a matching entry in §7 (and vice versa)
[ ] Critical analysis substance: §6 (reflection) makes at least one argument that the book itself does not make — not just "well written" or "clear derivation"
[ ] Consistency: chapter notes use the same notation for the same quantity across chapters (e.g., $\alpha$ is always the fractional order, not redefined in Ch 4)

5.3 Format Rules

[ ] Chinese punctuation throughout (，。：；？！""''）
[ ] Markdown LaTeX format ( $...$ or $$...$$) for any mathematical formulas — not mandatory to include math, but when present it must use LaTeX notation
[ ] Consistent heading hierarchy (# → ## → ###, no skipped levels)
[ ] No character encoding issues (verify with file -i *.md returning utf-8)

5.4 Python Equation Snippet Policy (A6)

When a formula is mechanically useful (i.e., you'd run it as code to verify a result), include a fenced Python block alongside the LaTeX:

- **关键方程 (3.1)**：Laplacian with fractional power
  $$\Delta^{\alpha/2} u(x) = \mathcal{F}^{-1}\left[|k|^{\alpha} \hat{u}(k)\right](x)$$

  ```python
  import numpy as np
  def frac_laplacian_fft(u, dx, alpha):
      k = np.fft.fftfreq(len(u), d=dx) * 2*np.pi
      u_hat = np.fft.fft(u)
      return np.real(np.fft.ifft(np.abs(k)**alpha * u_hat))
  ```

Rules: - Use Python snippets only when the equation has a runnable verification path - Default to LaTeX-only for derivations, identities, and asymptotic forms - Always include the symbol table in the surrounding prose; do not let the code block carry the math alone

5.5 Reference Count Standardization (B8)

Per chapter: ≥ 5 references, numbered [X1], [X2], ... in order of first appearance in the body
Per book: 30–80 references across all chapters + the book summary's own reference section
Format: [XN] Author(s). Title. Journal Year;Vol(Issue):Pages. DOI (journal article); [XN] Author(s). Title. Publisher, Year. ISBN (book)
Cross-chapter sharing: if a reference appears in multiple chapters, renumber it in each chapter's local list (do not maintain a global numbering) — chapter notes are standalone

5.6 Technical / Output Structure

[ ] Files named consistently: Chapter-XX-[short-title].md (use 01-, 02-, ... for ordering)
[ ] All chapter files saved under ~/Documents/Reading/[Book-Title]/
[ ] Book summary saved as book_summary.md at the book root
[ ] One README.md at the book root listing all chapters and their one-line summaries

Phase 6 — Execution Mode (Main Agent vs Subagents)

By default, the main agent reads every chapter and writes every note itself — no delegate_task calls. This is the preferred mode for this workflow. Subagents are an optional escape hatch for narrow cases (see §6.4).

6.1 Default: Main-Agent Serial Execution

The main agent loops over chapters, reading the chapter text and writing the corresponding Chapter-XX.md directly. This works because:

Reading + note-taking is a reasoning-heavy, sequential task — the main agent retains context from earlier chapters (notation, cross-references, terminology consistency).
Notes are self-contained by design (§5.2 self-containment check), so a later chapter does not need an earlier chapter's full text to be written.
Delegation adds handoff overhead (context packaging, output verification, retry on subagent errors) that often exceeds the time saved.

Expected throughput: ~8–12 chapters per main-agent session, depending on chapter length. For books with more than 15 chapters, see §6.5 (long-book strategy).

6.2 When the User Explicitly Says "main agent only"

If the user says things like "全部在主任务中进行，不要触发子任务" or "由你（主 agent）亲自", honor it absolutely:

Do not call delegate_task, even for "easy" sub-tasks like slicing the PDF or running OCR on figures.
Do not call cronjob for incremental progress — the user wants a single, focused, in-context session.
Inline everything: terminal() for the slicing/OCR, write_file() for each chapter, read_file() for cross-checking.

This is the default for this user unless the user explicitly opts in to delegation.

6.3 In-Loop Workflow (Main Agent)

Per chapter, the main agent does:

1. read_file(chapter_XX.txt)             # full text, no truncation
2. (optional) read_file(book_toc.json)   # for cross-chapter reference checks
3. write_file(Chapter-XX.md, <note>)     # one-shot write of the full note
4. terminal() to run §5.0 self-check     # verify word count, sections, refs
5. if any check fails → patch() and re-run §5.0

No batching: write each chapter, verify, then move on. Do not write 5 chapters in parallel and verify at the end — that defeats the self-check.

6.4 Optional: When Subagent Delegation IS Appropriate

Delegate only when the user explicitly opts in, AND at least one of the following is true:

The book has > 30 chapters and the user has agreed to delegation
A single chapter text is > 200 KB and the user is OK with a subagent
A clearly independent sub-task exists (e.g., "build the reference library index from the existing 12 chapter notes") that the main agent can verify

If you do delegate:

Task description template (pass verbatim to the subagent). Do not weaken this template to "≥ 2000 字符"; the subagent should be told to fill each of the 7 sections with concrete mathematical / conceptual content from the chapter, and the word count will follow naturally.

Read the chapter text from /path/to/chapter_XX.txt
(this is Chapter X of "[Book Title]" by [Author], [Year])

Your job: produce a self-contained Chinese reading note (saved to the path below)
that an external reader (who has NOT read the book) can follow end-to-end.

Target structure (all 7 sections MANDATORY, each filled with concrete content
from the chapter — do NOT pad with generic phrases):

  §1 作者
     - The actual author(s) of this chapter
     - Their affiliation if mentioned
     - The chapter's role in the book (e.g., "this is a methods chapter that
       introduces the variational framework used throughout the rest of the book")

  §2 内容概述  (300-500 Chinese characters)
     - What problem this chapter addresses
     - What the main result or message is
     - Where it fits in the book's overall argument
     - Prerequisites a reader needs to follow this chapter

  §3 核心方程与概念  (500-1000 Chinese characters; LONGEST section)
     - For EVERY key equation: write the LaTeX, define every variable, and
       explain the physical/mathematical meaning in 2-3 sentences
     - For EVERY major concept: define it and give at least one concrete
       example from the chapter
     - Show the key steps of derivations (not algebraic drudgery)
     - Mark empirical vs. theoretical equations

  §4 关键结论
     - 3-7 bullet points, each stating ONE specific result with its conditions
     - Each conclusion should be falsifiable (state the regime, parameter range,
       or assumption under which it holds)

  §5 挑战和开放性问题
     - At least 3 items; not just "more research is needed" but specific
       gaps the chapter itself flags, plus gaps you notice

  §6 个人反思与批判性分析
     - At least one observation that goes BEYOND what the chapter says
     - E.g., connection to other chapters, a limitation the author didn't
       acknowledge, a method that would improve the result
     - Avoid generic praise ("well-written", "clear derivation")

  §7 重要参考文献  (≥ 5 references, in order of first appearance [X1]…[XN])
     - Full citation with DOI when available
     - One line per reference; alphabetize is NOT required; order = appearance order

Mechanical checks (run yourself before saving):
  - Word count ≥ 2000 Chinese characters (use the same Python regex as §5.0)
  - All 7 sections present
  - At least 5 [XN] references
  - At least 1 inline LaTeX per subsection in §3

Save to: ~/Documents/Reading/[Book]/Chapter-XX.md

Do NOT:
  - Use placeholder text like "..." or "TODO"
  - Skip §5 or §6 because the chapter "doesn't have" challenges/reflections
    (every chapter has both; find them or infer them from the text)
  - Exceed 100 KB output (the file should be ≈ 15-30 KB)

Context to pass: book metadata (title, author, year, publisher), output file path, chapter-specific requirements, language (Chinese), notation (LaTeX).
Post-delegation verification (mandatory, do not skip):

wc -m ~/Documents/Reading/[Book]/Chapter-XX.md   # check character counts
grep -c "Section [1-7]" Chapter-XX.md           # verify all 7 sections present
# Plus the full §5.0 self-check, run by the main agent

Treat subagent output as a draft, not a finished product. The main agent must read back the chapter note, fix any quality issues, and re-run §5.0 before accepting it.

6.5 Long-Book Strategy (> 15 chapters)

When the book exceeds what the main agent can do in one session (~12 chapters):

Read first 8–10 chapters in this session, stopping before context becomes uncomfortable
Save progress in a progress.md file in the book folder: "Done: Ch 1–8. Next: Ch 9 (title). Pending: Ch 10–N."
Do NOT auto-schedule the next session via cronjob — let the user initiate the next session with "继续"
In the next session, the user says "读取 [previous-session-id]，并继续" → the new main agent reads progress.md and resumes

This is the human-in-the-loop checkpoint that prevents the agent from drifting off-policy during long, unattended runs.

Lifecycle note (revised 2026-06-02): progress.md is a short-lived workflow log. Once the book is complete and Phase 8 has synced everything to the public site, progress.md is auto-removed by Phase 8 §8.4. If you need long-term retention (e.g., for an audit trail or future re-reads), move progress.md to a personal archive (e.g., ~/Documents/Reading/_archive/) before the sync step. The progress.md content itself is not part of the public site.

Phase 7 — Review and Update

After completing the book:

Read your own notes 1 month later — what is unclear?
Update notes with insights from later chapters or other papers
Add a "Further Reading" section to each chapter with related papers you have since discovered
Cross-link notes: e.g., "cf. Chapter 12, Section 3.2 for the 3D version of this result"
Revise instruction.md with any lessons learned from this specific book

Phase 8 — Sync to Public Site (auto-triggered, revised 2026-06-02)

After every Chapter-XX.md is finalized, the main agent automatically syncs the work to the public reading-notes-site repository and triggers a Cloudflare Pages deployment. This phase is fully automatic — the user does not need to say "sync" or "deploy". It runs after the chapter write + §5.0 self-check both pass.

8.1 When This Triggers

Per chapter: after a Chapter-NN.md is written and passes §5.0 mechanical self-check, Phase 8 fires once for that chapter
Final sync: after book_summary.md is written (and passes §3.1 length check), Phase 8 fires one last time with the summary included
The user does not need to issue any command — Phase 8 is part of the write loop

8.2 Sync Step Template (copy-pasteable)

For each chapter write, the main agent executes:

# 1. Sync source files to site (cp -p preserves mtime; md5sum diff first)
SRC=~/Documents/Reading/<book-slug>
DST=~/Documents/reading-notes-site/docs/ReadingNotes/<book-slug>   # note: site uses simpler slug if it differs
# (Use site-slug mapping from §8.4 if Reading/ and site/ folders don't match)

for f in Chapter-NN.md; do
  src_md5=$(md5sum "$SRC/$f" | cut -d' ' -f1)
  dst_md5=$(md5sum "$DST/$f" 2>/dev/null | cut -d' ' -f1)
  if [ "$src_md5" != "$dst_md5" ]; then
    cp -p "$SRC/$f" "$DST/$f"
    echo "synced $f"
  fi
done

# (generate_pages.py is NOT run here — it runs in the Cloudflare Pages
#  build step, see reading-notes-site/.github/workflows/deploy.yml
#  "Generate pages" step. Running it twice would race the agent's local
#  build artifacts against the cloud build and produce stale index pages.)

# 2. (Optional but recommended for first sync only) verify mkdocs build
#    Skip for incremental per-chapter syncs — GitHub Actions catches broken markdown on push
# mkdocs build

# 3. Commit
cd ~/Documents/reading-notes-site
git add docs/
git commit -m "FRDE: sync Chapter-NN.md"  # or sync book_summary.md for final sync

# 4. Push (triggers GitHub Actions → generate_pages.py → mkdocs build → CF Pages deploy)
git push

Per-chapter syncs are small commits (a few KB). Final sync includes book_summary.md (~25KB).

8.3 Sync Whitelist (what to copy vs not)

Sync to site (auto):

File pattern	Why
`Chapter-NN.md`	Core reading notes
`README.md`	Book metadata (editor, year, ISBN)
`book_summary.md`	Phase 3 mandatory product

Do NOT sync:

File pattern	Why
`progress.md`	Personal workflow log (also auto-cleaned, see §8.4)
`ch_src/ch_*.txt`	PDF split intermediates (also auto-cleaned)
`ch_src/ch__part.txt`	Large-chapter sub-splits
`*.bak` / `-bak/` folders	User history
User scripts (`split_.py`, `merge_.py`, `slice_.py`, `ocr_.py`)	OCR pipeline assets

The generate_pages.py script (lines 22-37, 92-103) only walks Chapter-NN.md and copies them. It does not copy ch_src/ or progress.md to docs/books/ even if those files are present in the source. So in practice, whitelist enforcement is partially automatic via generate_pages.py — but Phase 8 cleanup (§8.4) still removes them from the source folder.

8.4 Cleanup After Sync (auto)

After a successful push, the main agent removes intermediate files from the source folder. This is a separate git commit so the cleanup is auditable and revertable.

Auto-cleaned (no user prompt):

# Default cleanup targets
SRC=~/Documents/Reading/<book-slug>

# 1. PDF text splits (the main bulk of intermediate files)
rm -rf "$SRC/ch_src/"

# 2. Per-book progress.md (workflow log, not for public)
rm -f "$SRC/progress.md"

# 3. /tmp intermediates from the read session
rm -f /tmp/<book-slug>*.txt /tmp/cover.png 2>/dev/null

# 4. Reading-root workflow-meta files (heuristic: name contains progress/check_list/decision/log/notes_meta)
#    Only when they're at ~/Documents/Reading/ root, not inside book folders
for f in ~/Documents/Reading/{progress,check_list,decision_log,notes_meta,todo,workflow_log}*.md; do
  [ -f "$f" ] && rm -f "$f"
done

Preserved (auto, but not part of cleanup):

Chapter-NN.md, README.md, book_summary.md — synced to site (already mirrored)
.bak/, *-bak/ — user history
User scripts (*.py not matching generate_pages.py / merge_ocr.py style)
Any file inside a book folder that doesn't match the auto-clean patterns

User-prompted (non-whitelist files):

For any other file inside the book folder (e.g., a stray notes.txt, drafts.md, experiments/), the main agent prints a numbered list to the terminal and asks for a one-line text response:

The following non-whitelist files were found in the book folder:
  [1] 2024-Fractional-Dispersive-Models-and-Applications-Kevrekidis/notes.txt   3.2KB
  [2] 2024-Fractional-Dispersive-Models-and-Applications-Kevrekidis/drafts.md   1.1KB
  [3] 2024-Fractional-Dispersive-Models-and-Applications-Kevrekidis/ocr_log.txt 12KB
Clean these? (a) all / (n) none / (1,3) select / (q) cancel

The user replies with one of: - a — clean all listed files - n — keep all listed files (no action) - 1,3 or 1 3 — clean only files 1 and 3 - q — cancel cleanup, leave all files in place

This is a one-round text exchange. The main agent does not loop on this — one reply is enough.

8.5 Cleanup Commit (auditable)

After auto-cleanup and any user-selected cleanup, the main agent commits the cleanup in a separate commit:

cd ~/Documents/Reading
git add -A
git commit -m "chore: clean ch_src/, progress.md, /tmp intermediates after sync"

This keeps cleanup revertable via git revert <commit-hash>.

8.6 Deployment Confirmation

After git push succeeds, the deployment is:

GitHub Actions runs .github/workflows/deploy.yml in the site repo
Steps: install mkdocs-material → generate_pages.py → mkdocs build → CF Pages deploy
CF Pages serves the updated site at the configured URL (typically https://reading.<domain>/)

The main agent does not need to verify the deployment — GitHub Actions will report failures via email/webhook if any. The agent's job ends at git push returning success.

8.7 Fault Recovery (revised 2026-06-02)

Per the user's "遇到问题尝试修复" policy, the main agent attempts to fix common failures before stopping. Recovery strategy by failure type:

Failure	Likely cause	Auto-fix attempt	Max retries
`cp` permission denied	File ownership / read-only bit	`ls -la` to inspect; `chmod u+rw` if user-owned	1
`cp` no space left	Disk full	`df -h` to confirm; `rm` largest `/tmp/*` files (safe —regen-able)	1
`generate_pages.py` Python error	Edge case in filename/slug	Read traceback, fix the specific bug, re-run	1
`git add` index corrupted	Concurrent `git` operations	`git reset HEAD` then re-`git add`	1
`git commit` nothing to commit	All files already synced (no diff)	Skip commit silently (this is success, not failure)	—
`git push` non-fast-forward	Remote has newer commits	`git pull --rebase` then `git push`	2
`git push` network error	Transient	`sleep 5` then `git push`	2
`git push` auth error	Token expired / SSH key issue	Report to user; do not try to refresh credentials automatically	—
Broken markdown in commit	LaTeX / Unicode edge case	Do not auto-edit user content; report which file is broken and the warning	—

Universal reporting rule: every retry attempt must be reported in plain text to the user ("Retry 1 of 2: ... result: ..."). The main agent does not silently retry.

Hard stop conditions (do not retry further): - 3 consecutive failures of the same type - Auth errors (credentials are the user's domain) - A request to modify the user's actual note content (the agent only fixes workflow, not content)

When stopping, print a clear failure summary including the failed command, the error message, and a suggested manual fix (e.g., cd reading-notes-site && git pull --rebase && git push).

8.8 Sync Log (in progress.md, before cleanup)

If progress.md still exists when Phase 8 runs (it normally would for the final sync), the main agent appends a brief sync log section before §8.4 removes it:

## Sync Log
- 2026-06-02 08:00: Chapter-10.md synced, commit cd40e30
- 2026-06-02 08:15: book_summary.md synced, commit ab12cd3
- 2026-06-02 08:16: cleanup: removed ch_src/, progress.md, /tmp/3 files

This log is transient (gets removed with progress.md cleanup) but is useful for debugging if a sync fails. If the user wants a permanent sync audit, they should set up a log destination outside ~/Documents/Reading/ (e.g., a sync_history.md at ~/Documents/Reading/_archive/).

Appendix — Formula Collection Template

## 公式汇总

| # | 名称 | 形式 | 物理意义 | 类型 |
|---|------|------|----------|------|
| (X.1) | | | | (T)/(E) |
| (X.2) | | | | (T)/(E) |

注：(T)=理论推导，(E)=经验公式

Appendix — Book Metadata Template

- **书名**：
- **作者**：
- **出版社**：
- **出版年份**：
- **ISBN**：
- **DOI**：
- **核心主题**：
- **目标读者**：
- **前置知识**：
- **相关书籍**：

This SOP is a living document. Revise after each book based on what worked and what did not.