Causal Inference and Discovery in Python — 全本总结

作者

Aleksander Molak（独立机器学习研究者、Lespire.io 联合创始人），技术审稿 Nicole Königstein（impactvise 创始人、Quantmate 顾问），Packt 出版社 2023 年 4 月出版。本书是 2023–2024 年 Python 因果推断 / 因果发现领域最受欢迎的实战型教材之一。

内容概述

本书是因果推断 + 因果发现的实战指南——以 Python 库（DoWhy、EconML、gCastle、Causica、causal-learn、CausalBert、PyTorch）为主线，分三部分系统讲解： - Part 1（Ch 1–5）：因果基础——因果 vs 关联、Simpson 悖论、Pearl 因果阶梯、DAG 表示、do-calculus、d-separation。 - Part 2（Ch 6–11）：因果推断——(C)ATE 估计、双重 ML、合成控制、meta-learners（S/T/X/R/DR-Learner）、高级 estimator（DR / DML / TMLE / Causal Forest）、深度学习因果（TARNet / SNet / CausalBert）。 - Part 3（Ch 12–15）：因果发现 + 收尾——因果图 3 来源、4 大家族（constraint/score/functional/gradient）+ gCastle、DECI 深度学习因果 + FCI hidden conf + ENCO/ABCI 干预数据、商业案例 + 5 步项目法 + 4 大未来方向。

核心工程化遗产：(1) DoWhy 4-step 框架（Model → Identify → Estimate → Refute）；(2) 5 步项目法（问题 → 专家 → 图 → 可识别性 → 证伪）；(3) Hybrid methods（专家 + 算法 + 实验三件套）；(4) 6 大核心库的统一用法（DoWhy / EconML / gCastle / Causica / causal-learn / CausalBert）。

全书主线逻辑

Molak 把全书组织成一条"逻辑链"——每章在前一章基础上搭建：

Part 1：因果是什么（Ch 1–5）

Ch 1（Causality – Hey, We Have ML）：用 Simpson 悖论开场——传统 ML 容易混淆"关联"与"因果"；引出"为什么需要因果方法"。
Ch 2（Judea Pearl and the Ladder of Causation）：介绍 Pearl 的 Ladder of Causation——3 层（Association / Intervention / Counterfactual）；反事实公式 $Y_x(u) = Y_{M_x}(u)$。
Ch 3（Regression, Observations, and Interventions）：用线性回归做对照实验——展示"控制变量"与"do-operator"的关键区别。
Ch 4（Graphical Models）：DAG 表示、邻接矩阵、链/叉/对撞子的图形基础。
Ch 5（Forks, Chains, and Immoralities）：d-separation + Markov factorization + faithfulness + minimality。

Part 2：因果怎么算（Ch 6–11）

Ch 6（Cracking Open Causal Inference）：do-calculus 三规则 + back-door / front-door 准则。
Ch 7（The Four-Step Process of Causal Inference）：DoWhy 4-step 框架——Model → Identify → Estimate → Refute；DoWhy + EconML API 实战。
Ch 8（Causal Models – Assumptions and Challenges）：4 大假设（unconfoundedness / positivity / consistency / SUTVA）+ DAG misspecification 后果。
Ch 9（Causal Inference and ML – Matching to Meta-learners）：5 大 meta-learner（S/T/X/R/DR-Learner）+ LaLonde NSW 数据实战。
Ch 10（Causal Inference and ML – Advanced Estimators）：DR / DML / TMLE / Causal Forest / GRF 等高级 estimator。
Ch 11（Causal Inference and ML – Deep Learning, NLP）：TARNet → SNet → CFR → CausalBert + Bayesian Synthetic Control（CausalPy 实战）。

Part 3：因果图怎么来（Ch 12–15）

Ch 12（Can I Have a Causal Graph, Please?）：因果知识 3 来源（科学 / 经验 / 算法）+ Hybrid methods 哲学。
Ch 13（Causal Discovery and ML – from Assumptions to Applications）：4 大家族 + gCastle + 专家知识注入。
Ch 14（Causal Discovery and ML – Advanced Deep Learning and Beyond）：DECI + FCI + ENCO + ABCI + 真实数据挑战。
Ch 15（Epilogue）：5 步项目法 + 4 个商业案例 + 4 大未来方向 + 学习资源。

核心方程与概念汇总

因果推断的核心数学

Pearl Ladder of Causation（3 层）：
L1 (Association)：$P(Y \mid X)$——观测 / 预测。
L2 (Intervention)：$P(Y \mid do(X))$——干预 / 因果效应。
L3 (Counterfactual)：$P(Y_x \mid X=x', Y=y')$——反事实 / 个体因果。
反事实公式（Pearl, Ch 2）： $$Y_x(u) \;=\; Y_{M_x}(u)$$
干预 vs 观测（核心差异）：
观测：$P(Y \mid X=x) = \sum_z P(Y \mid X=x, Z=z) P(Z=z \mid X=x)$
干预：$P(Y \mid do(X=x)) = \sum_z P(Y \mid X=x, Z=z) P(Z=z)$——切断 $X \to Z$ 的边。
do-calculus 三规则（Pearl 1995, Ch 6）：
插入 / 删除观测：$P(y \mid do(x), z, w) = P(y \mid do(x), w)$（若 $(Z \perp\!\!\!\perp Y \mid X, W)_{G_{\overline{X}}}$）
干预 / 观测交换：$P(y \mid do(x), do(z), w) = P(y \mid do(x), z, w)$（若 $(Z \perp\!\!\!\perp Y \mid X, W)_{G_{\overline{X}\underline{Z}}}$）
插入 / 删除干预：$P(y \mid do(x), do(z), w) = P(y \mid do(x), w)$（若 $(Z \perp\!\!\!\perp Y \mid X, W)_{G_{\overline{X}, \overline{Z(W)}}}$）
Back-door 准则：调整 $X$ 的直接非因——若 $Z$ 阻断所有从 $X$ 到 $Y$ 的 back-door paths 且 $Z$ 不含 $X$ 的后代，则 $$P(Y \mid do(X)) = \sum_z P(Y \mid X, Z=z) P(Z=z)$$
Front-door 准则：用中间变量 $M$——若 (a) $M$ 阻断 $X \to Y$ 的所有有向路径；(b) 无 unblocked back-door from $X$ to $M$；(c) 所有 back-door from $M$ to $Y$ 被 $X$ 阻断。 $$P(Y \mid do(X)) = \sum_m P(M=m \mid X) \sum_{x'} P(Y \mid M=m, X=x') P(X=x')$$

元学习器（Meta-learners, Ch 9）

S-Learner（Single）：用单一模型 $\mu(x, t) = \mathbb{E}[Y \mid X=x, T=t]$；$\hat{\tau}(x) = \hat{\mu}(x, 1) - \hat{\mu}(x, 0)$。
T-Learner（Two）：分别训练 $\mu_0(x) = \mathbb{E}[Y \mid X=x, T=0]$ 与 $\mu_1(x) = \mathbb{E}[Y \mid X=x, T=1]$；$\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)$。
X-Learner（Künzel et al. 2019）：用 imputed effects： $$\hat{\tau}_0(x) = \mathbb{E}[Y(1) - \hat{\mu}_0(X) \mid X=x, T=1]$$ $$\hat{\tau}_1(x) = \mathbb{E}[Y(0) - \hat{\mu}_1(X) \mid X=x, T=0]$$ $$\hat{\tau}(x) = g(x) \hat{\tau}_0(x) + (1-g(x)) \hat{\tau}_1(x), \quad g(x) = \mathbb{E}[T=1 \mid X=x]$$
R-Learner（Nie & Wager 2021）：Robinson 分解 $Y_i - \hat{m}(X_i) = \tau(X_i) (T_i - \hat{e}(X_i)) + \epsilon_i$，最小化 R-loss。
DR-Learner（AIPW-based, Ch 9）：Doubly-Robust——结合 outcome regression + IPW： $$\hat{\phi}(x) = \frac{1}{n} \sum_i \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) + \frac{T_i (Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)} - \frac{(1-T_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{e}(X_i)} \right]$$

高级 Estimator（Ch 10）

DR (Doubly Robust)：$\hat{\tau}_{\text{DR}} = \frac{1}{n} \sum_i \left[ \frac{T_i (Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)} - \frac{(1-T_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{e}(X_i)} + \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) \right]$
DML (Double ML, Chernozhukov et al. 2016)：用 cross-fitting 估计 nuisance functions，避免正则化偏差。
TMLE (Targeted Maximum Likelihood, van der Laan & Rubin 2006)：先估计 outcome model + treatment model，再用 targeted bias 校正。
Causal Forest (Wager & Athey 2018) / GRF (Athey-Tibshirani-Wager 2019)：基于 Honest RF 的异质性处理效应估计。
DRIV (Doubly Robust IV)：用 instrumental variable 处理 unobserved confounding。

深度学习因果（Ch 11）

TARNet（Shalit et al. 2017）：shared representation + separate heads： $$\Phi: X \to \mathbb{R}^d, \quad h_t(\Phi(X)) = \mathbb{E}[Y \mid X, T=t], \quad t \in \{0, 1\}$$
CFR（Counterfactual Regression）：加 IPM（Integral Probability Metric）正则化使 $\Phi(X) \mid T=0 \approx \Phi(X) \mid T=1$。
SNet / FlexTENet / OffsetNet（Curth & van der Schaar 2021）：多任务 + shared-specific 分解。
CausalBert（Veitch et al. 2020）：把 Bert 作为 treatment 调节器估计 CATE。
Bayesian Synthetic Control（CausalPy / PyMC Labs）：通过 Bayesian 时间序列模型 + 反事实预测做效应估计。

因果发现的四大家族（Ch 13）

PC 算法 5 步：(1) fully connected graph；(2) delete unconditional independent edges；(3) iterate conditional independent deletions；(4) orient colliders；(5) propagate orientations。
GES 算法（Chickering 2003）：两阶段 greedy search —— forward + backward，使用 BIC / BDeu 评分。
ANM（Hoyer et al. 2008）：$Y = f(X) + \epsilon$，$\epsilon \perp\!\!\!\perp X$；用 HSIC 检验 independence。
LiNGAM（Shimizu et al. 2006）：$Y = aX + \epsilon$，$\epsilon$ 非高斯；用 ICA 恢复方向。
NOTEARS（Zheng et al. 2018）：DAG-ness constraint $$\mathcal{R}(A) = \mathrm{tr}(e^{A \odot A}) - d = 0 \iff A \text{ is DAG}$$
augmented Lagrangian 优化。
GOLEM（Ng et al. 2020）：likelihood objective + soft DAG constraint。

高级因果发现（Ch 14）

DECI（Geffner et al. 2022）：end-to-end Bayesian 因果发现 + ICGNN 拟合非线性 + Gumbel-Softmax 离散采样。
FCI（Spirtes et al. 2000）：4 种 edge type —— directed / o-> / o-o / ---（隐变量）。
ENCO（Lippe et al. 2022）：边存在 + 方向独立参数化；全干预下保证收敛。
ABCI（Toth et al. 2022）：active experimentation + sequential causal query。
CCANM / CORTH：cascade mediation / causal feature selection 的小众方法。

商业与项目方法（Ch 15）

5 步项目法（Molak 工程化清单）：
Starting with a question：$\mathcal{Q} = \{Q_1, \ldots, Q_K\}$，$K \le 2$。
Obtaining expert knowledge：$\mathcal{E} = \{(e_i, c_i)\}$，$c_i \in [0,1]$。
Generating hypothetical graph(s)：$G = (V, E)$。
Check identifiability：$\Pr(\text{identifiable}) = 1$。
Falsifying hypotheses：Popperian falsification。
失败概率（隐式）：$p_f(\mathcal{Q}) = 1 - \exp(-\alpha |\mathcal{Q}|)$。
CATE 信息增益：$I_{\text{gain}} = \mathrm{Var}(\tau(X))$。

关键结论

因果方法不是替代 ML，而是补充 ML——ML 适合"预测"，因果方法适合"干预"与"反事实"。Molak 反复强调：不要把 ML 当因果，不要把因果当 ML。
DoWhy 4-step + EconML API是当前因果推断的事实标准——model → identify → estimate → refute。生产中：所有因果项目遵循此框架。
5 大 Meta-Learner 各有适用场景：S-Learner 简单但 biased；T-Learner 平衡；X-Learner 适合 imbalanced data；R-Learner 理论最严谨；DR-Learner 最 robust。生产建议：在样本不平衡 / 高维特征下用 X-Learner 或 DR-Learner。
DR / DML / TMLE / Causal Forest 是 4 大高级 estimator——DR 简单 + robust；DML 理论严谨；TMLE 高效；Causal Forest 异质性。生产建议：用 DML 作为 baseline + Causal Forest 做异质性分析。
TARNet / SNet / CausalBert 是 DL 因果的"主力"——TARNet 入门；SNet 多任务；CausalBert 处理文本 confounders。生产建议：TARNet 是 default；高维 / 异质任务用 SNet；NLP 场景必用 CausalBert。
Causal discovery 的 4 大家族没有"万能算法"——PC 简单、LiNGAM 非高斯场景、NOTEARS 连续优化、DECI 端到端。生产中必须 multi-algorithm + consensus edges。
Hybrid methods 是工程化金标准——专家 + 算法 + 实验三件套。gCastle 的 PrioriKnowledge + Causica 的 ExpertGraphContainer是当前最佳工程化实践。
Hidden confounding 处理：FCI 是唯一在 causal sufficiency 违反时仍渐近正确的 constraint-based 算法；DECI 不支持 hidden conf；ENCO 可扩展。生产中：先用 FCI 评估是否需要 hidden conf 处理。
5 步项目法是工程化最实用的成果——Molak 强调"问题定义 + 5 步迭代"是因果项目成功关键。
Causal data fusion 是工业级机会——把多源数据融合做因果推断，特别适合生物医学。
LLM 不是"真因果"——Willig et al. 2023 "Causal Parrots" 警告：LLM 学的是 associational meta-SCM。生产中：LLM 推理必须用干预数据校验。
6 大核心库：
DoWhy（Microsoft, Sharma & Kiciman 2020）——4-step 框架 + EconML 集成。
EconML（Microsoft, Battocchi et al. 2019）——DML / DML-IV / DRIV / CausalForestDML。
CausalImpact / CausalPy（PyMC Labs）——Bayesian Synthetic Control。
CATENets（Curth & van der Schaar 2021）——TARNet / SNet / FlexTENet / OffsetNet。
gCastle（Huawei Noah's Ark, Zhang et al. 2021）——PC / GES / LiNGAM / NOTEARS / GOLEM。
causal-learn（CMU CLeaR Group）——FCI / RFCI / tiered knowledge。
Causica（Microsoft, 2023）——DECI + ExpertGraphContainer。
transformers（HuggingFace）——CausalBert（Veitch et al. 2020）。

跨章节主题与工程化洞察

因果 vs 关联的根本差异：Molak 反复强调的 Pearl 阶梯——L1 (association) vs L2 (intervention) vs L3 (counterfactual)。工程化：永远先问"问题在哪一层"——错层问题不可解。
DAG 是因果推理的"通用语言"——Ch 4–5 介绍 DAG + d-separation，Ch 6 do-calculus，Ch 7 DoWhy，Ch 13–14 causal discovery。生产中：把 DAG 作为"团队沟通的通用图"。
假设检验是因果项目的"质量保证"——Ch 8 4 大假设、Ch 10 DML 假设、Ch 13 faithfulness、Ch 14 causal sufficiency。生产中：把"假设检验"做成 checklist。
算法不是银弹——组合方法（ensemble）是金标准——Ch 9 多 meta-learner、Ch 10 多 estimator、Ch 13 多 discovery 算法。生产建议：任何因果估计跑 2–3 种算法 + 报告 mean ± std。
专家知识 + 数据驱动 = 工程化突破——Ch 13 gCastle PrioriKnowledge、Ch 14 DECI ExpertGraphContainer。生产建议：把"专家图 + 数据图"对比——差异部分 = 关键审查点。
真实数据 vs 合成数据的鸿沟——Ch 13 Reisach 2021、Ch 14 真实 benchmark 案例。生产中：永远在真实数据 holdout 上验证——不信任 synthetic F1。
5 步项目法是"道"层面的核心——Molak 反复强调问题定义 + 迭代。生产中：5 步做成团队 SOP。
LLM + 因果是未来 5 年的核心方向——Kıcıman 2023 / Willig 2023 / "Causally aware imitation learning"。

挑战和开放性问题（跨章节）

Causal sufficiency 假设的可证伪性：所有 causal discovery 算法都"假设"无 hidden confounding，但没有统计 test 能可靠检测。生产 fallback：Cinelli & Hazlett 2020 sensitivity analysis。
Observational equivalence（CPDAG vs DAG）：PC / GES 只返回 MEC——若需唯一 DAG，必须借助干预数据。生产中：active experimentation + algorithm 的组合。
Hyperparameter 敏感性：Huang et al. 2021 Arctic Sea Ice 实验显示 NOTEARS / DAG-GNN 对超参敏感。生产建议：multi-seed + 报告 mean ± std。
样本量与维度的"trade-off"：D > 50 时 PC / GES 计算爆炸；CCANM 需要 5000–6000 obs。生产中：先做 feature selection + bootstrap。
混合变量类型：除 ENCO 外，多数算法不支持混合变量。生产中：one-hot encoding 会破坏"无序类别"语义——需用专门的混合算法。
真实数据 benchmark 不足：Tu 2019 / Huang 2021 / Shen 2020 是少数公开 case。生产中：鼓励团队贡献 case study。
LLM "假因果"风险：Willig 2023 警告 LLM 的 do-inference 偏离真分布。生产中：必须 ground 到真实干预数据。
"完全自动化科学家"尚未实现：Molak 在 Ch 15 强调"we still haven't reached the stage of a fully automated scientist"——agents + experiments + causal reasoning 是未来 5–10 年方向。

个人研究的关联思考

Molak 强调的"3 大技能"（Ch 15）：Ladder of Causation awareness + 结构化思维 + CATE modeling——这与生物力学 + 临床决策的工程化高度相关。
5 步项目法 + Six Sigma 是"工业级沟通语言"——可向临床医生 + 工程师讲因果。
Causal data fusion适合动物实验 + 临床数据融合——这是血管生物力学的核心场景。
Synthetic control（CausalPy 实战, Ch 11）适合"单病例 / 小队列"医学研究——这与我的研究有直接关联。
TARNet / SNet 在医学上——可作为"个体化治疗效应估计"工具。
DECI 的 end-to-end 框架 + 医学干预数据 是未来 5 年的临床决策支持系统方向。
Willig 2023 "Causal Parrots"的启示——LLM 不能替代专家——临床决策必须人机协同。

重要参考文献（综合）

Part 1：因果基础

Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press.
Pearl, J., & Mackenzie, D. (2019). The Book of Why. Penguin Books.
Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of Causal Inference. MIT Press.
Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.

Part 2：因果推断

Neyman, J. (1923). On the Application of Probability Theory to Agricultural Experiments. Statistical Science, 5(4), 465-480.
Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. JEP, 66(5), 688-701.
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. JASA, 89(427), 846-866.
Chernozhukov, V., et al. (2016). Double/Debiased Machine Learning for Treatment and Causal Parameters. arXiv.
Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. JASA, 113(523), 1228-1242.
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148-1178.
van der Laan, M. J., & Rubin, D. (2006). Targeted Maximum Likelihood Learning. U.C. Berkeley.
Künzel, S. R., et al. (2019). Meta-learners for Estimating Heterogeneous Treatment Effects. PNAS.
Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating Individual Treatment Effect: Generalization Bounds and Algorithms. ICML.
Curth, A., & van der Schaar, M. (2021). Nonparametric Estimation of Heterogeneous Treatment Effects. AISTATS / NeurIPS.
Veitch, V., Sridhar, D., & Blei, D. M. (2020). Adapting Text Embeddings for Causal Inference. UAI.
Abadie, A., & Gardeazabal, J. (2003). The Economic Costs of Conflict. AER.
Abadie, A. (2021). Using Synthetic Controls. AEA.
Kıcıman, E., Ness, R., Sharma, A., & Tan, C. (2023). Causal Reasoning and Large Language Models. arXiv.
Sharma, A., & Kiciman, E. (2020). DoWhy: An End-to-End Library for Causal Inference. arXiv.
Battocchi, K., et al. (2019). EconML: A Python Package for Causal Machine Learning. arXiv.

Part 3：因果发现

Rebane, G., & Pearl, J. (1987). The Recovery of Causal Poly-trees from Statistical Data. IJAR.
Verma, T., & Pearl, J. (1990). Equivalence and Synthesis of Causal Models. UAI 1990.
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian Networks. Machine Learning, 20, 197-243.
Chickering, D. M. (2003). Optimal Structure Identification with Greedy Search. JMLR, 3, 507-554.
Shimizu, S., et al. (2006). A Linear Non-Gaussian Acyclic Model for Causal Discovery. JMLR, 7, 2003-2030.
Shimizu, S., et al. (2011). DirectLiNGAM. JMLR, 12, 1225-1248.
Hoyer, P. O., et al. (2008). Nonlinear Causal Discovery with Additive Noise Models. NeurIPS 21.
Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent Component Analysis. Wiley.
Zheng, X., Aragam, B., Ravikumar, P., & Xing, E. P. (2018). DAGs with NO TEARS. NeurIPS 2018.
Ng, I., Ghassami, A., & Zhang, K. (2020). On the Role of Sparsity and DAG Constraints for Learning Linear DAGs. arXiv.
Geffner, T., et al. (2022). Deep End-to-end Causal Inference. arXiv.
Spirtes, P., et al. (2013). Causal Inference in the Presence of Latent Variables and Selection Bias. arXiv.
Lippe, P., Cohen, T., & Gavves, E. (2022). Efficient Neural Causal Discovery without Acyclicity Constraints. arXiv.
Toth, C., et al. (2022). Active Bayesian Causal Inference. arXiv.
Zhang, K., et al. (2021). gCastle: A Python Toolbox for Causal Discovery. arXiv.
Park, J., Song, C., & Park, J. (2022). Input Convex Graph Neural Networks. OpenReview.
Reisach, A. G., Seiler, C., & Weichwald, S. (2021). Beware of the Simulated DAG! Varsortability in Additive Noise Models. arXiv.
Kaiser, M., & Sipos, M. (2021). Unsuitability of NOTEARS for Causal Graph Discovery. arXiv.
Cai, R., et al. (2021). Causal Discovery with Cascade Nonlinear Additive Noise Models. ACM TIST.
Soleymani, A., et al. (2022). Causal Feature Selection via Orthogonal Search. TMLR.
Tu, R., et al. (2019). Neuropathic Pain Diagnosis Simulator. NeurIPS 32.
Huang, Y., et al. (2021). Benchmarking of Data-Driven Causality Discovery Approaches. Frontiers in Big Data, 4.
Shen, X., et al. (2020). Causal Discovery Algorithms: Application to Alzheimer's Pathophysiology. Scientific Reports, 10(1), 2975.
Andrews, B., Spirtes, P., & Cooper, G. F. (2020). On the Completeness of Causal Discovery with Tiered Background Knowledge. AISTATS.
Kaddour, J., et al. (2022). Causal Machine Learning: A Survey and Open Problems. arXiv.
Deng, Z., et al. (2022). Deep Causal Learning. arXiv.
Vowels, M. J., et al. (2022). D'ya Like DAGs? ACM Computing Surveys, 55(4), 1-36.
Berrevoets, J., et al. (2023). Causal Deep Learning. arXiv.
Bareinboim, E., & Pearl, J. (2016). Causal Inference and the Data-Fusion Problem. PNAS, 113(27), 7345-7352.
Hünermund, P., & Bareinboim, E. (2023). Causal Inference and Data Fusion in Econometrics. arXiv.
Schölkopf, B., et al. (2021). Toward Causal Representation Learning. Proceedings of the IEEE, 109(5), 612-634.
Willig, M., et al. (2023). Causal Parrots: Large Language Models May Talk Causality But Are Not Causal. ACM preprint.
Curth, A., et al. (2021). Really Doing Great at Estimating CATE? NeurIPS Datasets & Benchmarks.
Jeunen, O., et al. (2022). Disentangling Causal Effects from Sets of Interventions. NeurIPS 35.
Becker, J. M. (2016). The Book of Why (statistical control 批判).
Becker, M., et al. (2021). RealCause: Realistic Causal Inference Benchmarking. arXiv.
Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity. JRSS B, 81(1), 39-67.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Fisher, R. A. (1921). On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Metron, 1, 1-32.
Popper, K. (1959). The Logic of Scientific Discovery. Basic Books.
Popper, K. (1971). Conjectural Knowledge. Revue Internationale de Philosophie, 25(95/96), 167-197.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
Tetlock, P. E. (2005). Expert Political Judgment. Princeton University Press.
Tetlock, P. E., & Gardner, D. (2015). Superforecasting. Crown.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
LaLonde, R. J. (1986). Evaluating the Econometric Evaluations of Training Programs. AER.
Hill, J. L. (2011). Bayesian Nonparametric Modeling for Causal Inference. JCGS, 20(1), 217-240.
Hernán, M. A., & Robins, J. M. (2006). Instruments for Causal Inference. Epidemiology, 17(4), 360-372.
Imbens, G. W. (2004). Nonparametric Estimation of Average Treatment Effects. RES, 86(1), 4-29.
Hurwitz, J., & Thompson, J. K. (2023). Causal Artificial Intelligence. (Q4 2023).
Bareinboim, E. (2020). Causal Reinforcement Learning tutorial. (https://bit.ly/CausalRL).
Lippe, P., et al. (2022). CITRIS: Causal Identifiability from Temporal Intervened Sequences. ICML 2022.
Chau, S. L., et al. (2021). BayesIMP: Uncertainty Quantification for Causal Data Fusion. NeurIPS 34.
Gallea, Q. The Causal Mindset (in preparation).
Stahl, A. E., & Feigenson, L. (2015). Observing the Unexpected Enhances Infants' Learning. Science, 348(6230), 91-94.
Gopnik, A. (2009). The Philosophical Baby. Farrar, Straus and Giroux.
Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. Frontiers in Genetics, 10, 524.
Spirtes, P. (2010). Introduction to Causal Inference. JMLR, 11, 1643-1662.
Bareinboim, E., Correa, J. D., Ibeling, D., & Icard, T. (2020). On Pearl's Hierarchy and the Foundations of Causal Inference. ACM TEAC.
Cheng, L., et al. (2022). Data Fusion via Hard and Soft Constraints.
Hill, J. (2011). BART for Causal Inference.
Wager, S. (2020). Causal Inference: A Statistical Learning Approach (教材).
Neal Brady's video on faithfulness (https://bit.ly/BradyFaithfulness).
CD-Tribe (CDT) Python package (https://bit.ly/CDTMetricsDocs).
Neo4j, Amazon Neptune（图数据库）.
grapl-causal 库 (https://bit.ly/GRAPLCausal).
Playtika uplift-analysis 库 (https://bit.ly/PlaytikaUplift).
Spotify technical blog (https://bit.ly/SpotifySynthControl, https://bit.ly/SpotifyHiddenBlog).
Six Sigma / DMAIC 框架.
TensorCell (https://bit.ly/TensorCell) — 交通仿真案例.
Pearl's Hierarchy and the Foundations of Causal Inference (Bareinboim et al. 2020).
Ch 11 Bayesian Structural Time Series (BSTS) — Scott & Varian 2014.
8 schools example — Rubin 1981.
CLeaR 2023's Call for Causal Datasets (https://bit.ly/CLeaRDatasets).
Diagnosis Simulator (https://bit.ly/DiagnosisSimulator).
Causal Python newsletter (https://bit.ly/CausalPython).
Six books blog (https://bit.ly/SixBooksBlog).
Augmented Lagrangian method (Nemirovski 1999).
Frobenius norm (https://bit.ly/MatrixNorm).
AugmentedLagrangian optimizer (https://bit.ly/AugmentedLagrangian).
PyTorch Lightning (https://bit.ly/IntroToLightning).
Microsoft Causica (https://bit.ly/MicrosoftCausica).
ENCO GitHub (https://bit.ly/EncoGitHub).
ABCI GitHub (https://bit.ly/ABCIGitHub).
CLeaR Group (CMU) — causal-learn 维护者.
Maschinensucher (TensorCell) — 流量仿真.
Maahid Photos, Diego Ferrari, Max Avans — 图像引用（Pexels.com）.
Pew Research Center (2008). Men or Women: Who's the Better Leader?
Zenger, J., & Folkman, J. (2020). Research: Women Are Better Leaders During a Crisis. HBR.
Kostis, J. B., & Dobrzynski, J. M. (2020). Limitations of Randomized Clinical Trials. AJC, 129, 109-115.
Harrell, F. (2023). Randomized Clinical Trials Do Not Mimic Clinical Practice. Statistical Thinking.
Senn, S. S. (2020, 2021). Randomization works.
Hall, N. S. (2007). R. A. Fisher and His Advocacy of Randomization. JHB, 40(2), 295-325.
Wu, S., et al. (2023). Causal Inference in Observational Data.
Muldoon, S., et al. — 复杂网络案例.
Smaldino, P. — replication crisis 相关.
Kahneman's "Thinking, Fast and Slow" 引用.
Lacerda, G., Spirtes, P. L., Ramsey, J., & Hoyer, P. O. (2008). Discovering Cyclic Causal Models by ICA. UAI 2008.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). Max-min Hill-climbing BN Structure Learning. Machine Learning, 65(1), 31-78.
Cressie, N., & Read, T. R. (1984). Multinomial Goodness-of-Fit Tests. JRSS B, 46(3), 440-464.
Colombo, D., & Maathuis, M. H. (2012). Order-Independent Constraint-Based Causal Structure Learning. JMLR, 15, 3741-3782.
Le, T. D., et al. (2015). A Fast PC Algorithm for High Dimensional Causal Discovery. IEEE/ACM TCBB, 16, 1483-1495.
Peters, J., & Bühlmann, P. (2015). Structural Intervention Distance. Neural Computation, 27(3), 771-799.
Uhler, C., Raskutti, G., Bühlmann, P., & Yu, B. (2013). Geometry of the Faithfulness Assumption. Annals of Statistics, 436-463.
Erdős, P., & Rényi, A. (1959). On Random Graphs I. Publicationes Mathematicae Debrecen, 6, 290-297.
Barabási, A. L. (2009). Scale-Free Networks. Science, 325(5939), 412-413.
Barabási, A. L., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science, 286(5439), 509-512.
McLachlan, G. J., Lee, S. X., & Rathnayake, S. I. (2019). Finite Mixture Models. Annual Review of Statistics, 6(1), 355-378.
Kingma, D. P., et al. (2016). Improved Variational Inference with Inverse Autoregressive Flow. NeurIPS 29.
Goodfellow, I., et al. (2020). Generative Adversarial Networks. CACM, 63(11), 139-144.
Kalainathan, D., et al. (2022). Structural Agnostic Modeling. arXiv.
Goudet, O., et al. (2018). Causal Generative Neural Networks. arXiv.
Khemakhem, I., et al. (2021). Causal Autoregressive Flows. AISTATS 2021.
Zhang, K., et al. (2012). Kernel-based Conditional Independence Test. arXiv.
Blake, W. (1794/2009). The Tyger. Songs of Experience.
Grammer, K., & Thornhill, R. (1994). Human Facial Attractiveness and Sexual Selection. JCP, 108(3), 233-242.
Johnston, I. G., et al. (2022). Symmetry and Simplicity Spontaneously Emerge. PNAS, 119(11), e2113883119.
Enquist, M., & Arak, A. (1994). Symmetry, Beauty and Evolution. Nature, 372, 169-172.
Martín, F. M. (2009). The Thermodynamics of Human Reaction Times. arXiv:0908.3170.
Muenssinger, J., et al. (2013). Auditory Habituation in the Fetus and Neonate. Developmental Science, 16(2), 287-295.
Rosenberg, A., & McIntyre, L. (2020). Philosophy of Science: A Contemporary Introduction (4th ed.). Routledge.
Wikipedia, S. (2023). The Book of Why (work-in-progress reference).
Kim, J., et al. (2023). Causal Inference for Survival Analysis.
24 作者（PCA 引用，未列）。
Nie, X., & Wager, S. (2021). Quasi-Oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2), 299-319.
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian Additive Regression Trees. Annals of Applied Statistics, 4(1), 266-298.
Dorie, V., et al. (2019). Automated versus Do-It-Yourself Methods for Causal Inference. Statistical Science, 34(1), 97-118.
Baham, A., et al. (2023). Causal Inference with ML.
Fergusson, C. (2017). Causal Inference in Observational Studies.
O'Neill, B. (2021). The Book of Why 续作。
Wittgenstein, L. (1953). Philosophical Investigations（Ch 12 哲学暗示）。
Tufte, E. (2006). The Cognitive Style of PowerPoint（Ch 11 风格呼应）。
Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124-1131.
Bach, F. (2020). Learning Theory from First Principles（Ch 4 引用隐式）。
Boyd, S., & Vandenberghe, L. (2004). Convex Optimization（Ch 13 NOTEARS 引用）。

个人反思与批判性分析

本书是因果推断 / 因果发现领域近年来最"工程友好"的 Python 实战指南——它在三个层面做出独特贡献：

方法论层面的"哲学统一"——Molak 把 Pearl 的因果图框架、Neyman-Rubin 的潜在结果框架、Peters 的因果发现框架用 Python 工具链统一为可执行的工程化方法。这与 Pearl (2009)、Peters et al. (2017) 等理论教材形成互补——本书不教你"为什么"，而是教你"怎么做"。
库生态层面的"系统化整合"——DoWhy（4-step）、EconML（DML/GRF）、gCastle（discovery）、Causica（DECI）、causal-learn（FCI）、CausalBert（NLP）——本书展示了这些库如何"组合"使用，而不是"挑选"。生产中：团队的"因果工具箱"应包含全部 6 个库——按场景选用。
商业落地层面的"真实案例"——Geminos × Company M、causaLens、Playtika、Spotify——4 个案例横跨制造 / 综合平台 / 数字娱乐 / 流媒体——证明因果方法不是学术——是工程。生产启示：因果方法在工业优化、A/B test 替代、CATE 个性化等场景有真实价值。

值得深入讨论的几个层面：

Molak 的"工程师-哲学家"双重身份——他既是 TensorCell 等项目的实际 ML 工程师，又在哲学层面对因果方法有深入思考（Popper、Kahneman、Tetlock 引用）。这与一般纯学术作者（如 Pearl、Peters）有显著差异——他更关注"做出来"而非"证明"。生产中：选 Molak 这本书作为工程化教材，比纯学术教材更实用。
"5 步项目法"是工程化金标准——Molak 在 Ch 15 强调"问题定义 + 专家 + 图 + 可识别性 + 证伪"——这一框架与 Pearl 的"do-calculus"框架完全一致，但更"工程友好"。生产建议：把 5 步做成"项目 onboarding SOP"——比任何算法都重要。
"3 大技能"的可操作化：Molak 强调 Ladder of Causation awareness + 结构化思维 + CATE modeling。我的实操：(a) 项目立项时明确问题在哪一层（L1/L2/L3）；(b) 用 DAG 思维看 d-separation；(c) 在 A/B test 基础上加 CATE 分析。
"Hybrid methods"是工程化突破——专家 + 算法 + 实验——Molak 在 Ch 12、13、14、15 反复强调。生产中：(a) gCastle PrioriKnowledge + Causica ExpertGraphContainer 是工程化最强工具；(b) 任何 causal discovery 项目必须有"专家图"输入；(c) 多算法 consensus 是金标准。
真实数据 vs 合成数据的鸿沟：Molak 在 Ch 13、14、15 反复强调 Reisach 2021 等"反 benchmark"工作。生产中：(a) 永远在真实数据 holdout 上验证；(b) 不信任 synthetic F1 > 0.9 的报告；(c) 准备好接受"F1 = 0.3"的真实结果。
LLM + 因果是未来 5 年核心方向：Molak 在 Ch 15 预言 agent + 因果 + experiments = 自动化科学家。生产中：(a) 关注 LangChain / LlamaIndex 等 agent 框架与 DoWhy / EconML 的集成；(b) 用 LLM 做"假设生成"+ DoWhy/EconML 做"假设验证"；(c) 永远不要把 LLM 的"因果推理"输出作为最终答案——必须 ground 到真实数据。
6 大核心库的选择策略：
DoWhy + EconML：因果推断默认栈。
gCastle：因果发现入门——PC / GES / LiNGAM / NOTEARS。
Causica：DECI——端到端 + 深度学习。
causal-learn：FCI——hidden conf。
ENCO（自实现）——干预数据。
ABCI（自实现）——active experimentation。
CausalPy / CausalImpact：Bayesian Synthetic Control。
CATENets：深度学习因果（SNet / FlexTENet）。
transformers + CausalBert：NLP 因果。
对本人的研究启发：
5 步法 + Six Sigma 是向"工业 / 临床"团队讲因果的最佳语言。
Causal data fusion 适合"动物实验 + 临床数据"融合——是血管生物力学的核心方向。
Bayesian Synthetic Control 适合"单病例 / 小队列"医学研究。
TARNet / SNet 在医学上——个体化治疗效应估计。
DECI end-to-end + 医学干预数据是临床决策支持系统的方向。
Willig 2023 "Causal Parrots" 启示——LLM 不能替代专家——临床决策必须人机协同。
本书的整体评价：(a) 工程化最实用的因果教材之一；(b) 库选择最现代（DoWhy / EconML / gCastle / Causica / causal-learn）；(c) 商业案例最丰富（4 个真实案例）；(d) 未来方向最前沿（causal data fusion / agents / structure learning / imitation learning）。唯一不足：深度学习因果（TARNet / SNet / CausalBert）的实现细节较少，Ch 11 偏概念。但作为"入门 + 实践 + 方向"三位一体的教材，本书是首选。

重要参考文献

本节列出与全本总结相关的"全集"核心文献（按主题分类），详细引用见各章节文件：

理论教材

[X1] Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press.
[X2] Pearl, J., & Mackenzie, D. (2019). The Book of Why. Penguin Books.
[X3] Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of Causal Inference. MIT Press.
[X4] Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
[X5] Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.
[X6] Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.

经典论文

[X7] Neyman, J. (1923). On the Application of Probability Theory to Agricultural Experiments.
[X8] Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.
[X9] Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.
[X10] Rebane, G., & Pearl, J. (1987). The Recovery of Causal Poly-trees from Statistical Data.
[X11] Shimizu, S., et al. (2006). A Linear Non-Gaussian Acyclic Model for Causal Discovery.
[X12] Hoyer, P. O., et al. (2008). Nonlinear Causal Discovery with Additive Noise Models.
[X13] van der Laan, M. J., & Rubin, D. (2006). Targeted Maximum Likelihood Learning.
[X14] Chernozhukov, V., et al. (2016). Double/Debiased Machine Learning for Treatment and Causal Parameters.
[X15] Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.
[X16] Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests.
[X17] Zheng, X., et al. (2018). DAGs with NO TEARS.
[X18] Künzel, S. R., et al. (2019). Meta-learners for Estimating Heterogeneous Treatment Effects.
[X19] Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating Individual Treatment Effect: Generalization Bounds and Algorithms.
[X20] Geffner, T., et al. (2022). Deep End-to-end Causal Inference.

工业案例

[X21] Geminos × Company M 工业制造案例（Ch 15）。
[X22] causaLens 综合平台（Ch 15）。
[X23] Playtika uplift 优化（Ch 15）。
[X24] Spotify 合成控制 + hidden conf (Jeunen et al. 2022)。
[X25] Six Sigma / DMAIC 流程改进框架。

库与工具

[X26] Sharma, A., & Kiciman, E. (2020). DoWhy.
[X27] Battocchi, K., et al. (2019). EconML.
[X28] Zhang, K., et al. (2021). gCastle.
[X29] Park, J., et al. (2022). Causica / DECI.
[X30] CLeaR Group. causal-learn.
[X31] Veitch, V., Sridhar, D., & Blei, D. M. (2020). CausalBert.
[X32] PyMC Labs. CausalPy.
[X33] Curth, A., & van der Schaar, M. (2021). CATENets.

未来方向

[X34] Bareinboim, E., & Pearl, J. (2016). Causal Data Fusion.
[X35] Schölkopf, B., et al. (2021). Toward Causal Representation Learning.
[X36] Lippe, P., et al. (2022). CITRIS.
[X37] Toth, C., et al. (2022). ABCI.
[X38] Kıcıman, E., et al. (2023). LLMs + Causality.
[X39] Willig, M., et al. (2023). Causal Parrots.

5 步项目法

[X40] Molak, A. (2023). Ch 15 of Causal Inference and Discovery in Python. Packt.