跳转至

Causal Inference and Discovery in Python — 全本总结

作者

Aleksander Molak(独立机器学习研究者、Lespire.io 联合创始人),技术审稿 Nicole Königstein(impactvise 创始人、Quantmate 顾问),Packt 出版社 2023 年 4 月出版。本书是 2023–2024 年 Python 因果推断 / 因果发现领域最受欢迎的实战型教材之一。

内容概述

本书是因果推断 + 因果发现的实战指南——以 Python 库(DoWhy、EconML、gCastle、Causica、causal-learn、CausalBert、PyTorch)为主线,分三部分系统讲解: - Part 1(Ch 1–5):因果基础——因果 vs 关联、Simpson 悖论、Pearl 因果阶梯、DAG 表示、do-calculus、d-separation。 - Part 2(Ch 6–11):因果推断——(C)ATE 估计、双重 ML、合成控制、meta-learners(S/T/X/R/DR-Learner)、高级 estimator(DR / DML / TMLE / Causal Forest)、深度学习因果(TARNet / SNet / CausalBert)。 - Part 3(Ch 12–15):因果发现 + 收尾——因果图 3 来源、4 大家族(constraint/score/functional/gradient)+ gCastle、DECI 深度学习因果 + FCI hidden conf + ENCO/ABCI 干预数据、商业案例 + 5 步项目法 + 4 大未来方向。

核心工程化遗产:(1) DoWhy 4-step 框架(Model → Identify → Estimate → Refute);(2) 5 步项目法(问题 → 专家 → 图 → 可识别性 → 证伪);(3) Hybrid methods(专家 + 算法 + 实验三件套);(4) 6 大核心库的统一用法(DoWhy / EconML / gCastle / Causica / causal-learn / CausalBert)。

全书主线逻辑

Molak 把全书组织成一条"逻辑链"——每章在前一章基础上搭建:

Part 1:因果是什么(Ch 1–5)

  • Ch 1(Causality – Hey, We Have ML):用 Simpson 悖论开场——传统 ML 容易混淆"关联"与"因果";引出"为什么需要因果方法"。
  • Ch 2(Judea Pearl and the Ladder of Causation):介绍 Pearl 的 Ladder of Causation——3 层(Association / Intervention / Counterfactual);反事实公式 \(Y_x(u) = Y_{M_x}(u)\)
  • Ch 3(Regression, Observations, and Interventions):用线性回归做对照实验——展示"控制变量"与"do-operator"的关键区别。
  • Ch 4(Graphical Models):DAG 表示、邻接矩阵、链/叉/对撞子的图形基础。
  • Ch 5(Forks, Chains, and Immoralities):d-separation + Markov factorization + faithfulness + minimality。

Part 2:因果怎么算(Ch 6–11)

  • Ch 6(Cracking Open Causal Inference):do-calculus 三规则 + back-door / front-door 准则。
  • Ch 7(The Four-Step Process of Causal Inference):DoWhy 4-step 框架——Model → Identify → Estimate → Refute;DoWhy + EconML API 实战。
  • Ch 8(Causal Models – Assumptions and Challenges):4 大假设(unconfoundedness / positivity / consistency / SUTVA)+ DAG misspecification 后果。
  • Ch 9(Causal Inference and ML – Matching to Meta-learners):5 大 meta-learner(S/T/X/R/DR-Learner)+ LaLonde NSW 数据实战。
  • Ch 10(Causal Inference and ML – Advanced Estimators):DR / DML / TMLE / Causal Forest / GRF 等高级 estimator。
  • Ch 11(Causal Inference and ML – Deep Learning, NLP):TARNet → SNet → CFR → CausalBert + Bayesian Synthetic Control(CausalPy 实战)。

Part 3:因果图怎么来(Ch 12–15)

  • Ch 12(Can I Have a Causal Graph, Please?):因果知识 3 来源(科学 / 经验 / 算法)+ Hybrid methods 哲学。
  • Ch 13(Causal Discovery and ML – from Assumptions to Applications):4 大家族 + gCastle + 专家知识注入。
  • Ch 14(Causal Discovery and ML – Advanced Deep Learning and Beyond):DECI + FCI + ENCO + ABCI + 真实数据挑战。
  • Ch 15(Epilogue):5 步项目法 + 4 个商业案例 + 4 大未来方向 + 学习资源。

核心方程与概念汇总

因果推断的核心数学

  • Pearl Ladder of Causation(3 层):
  • L1 (Association)\(P(Y \mid X)\)——观测 / 预测。
  • L2 (Intervention)\(P(Y \mid do(X))\)——干预 / 因果效应。
  • L3 (Counterfactual)\(P(Y_x \mid X=x', Y=y')\)——反事实 / 个体因果。

  • 反事实公式(Pearl, Ch 2): $\(Y_x(u) \;=\; Y_{M_x}(u)\)$

  • 干预 vs 观测(核心差异):

  • 观测:\(P(Y \mid X=x) = \sum_z P(Y \mid X=x, Z=z) P(Z=z \mid X=x)\)
  • 干预:\(P(Y \mid do(X=x)) = \sum_z P(Y \mid X=x, Z=z) P(Z=z)\)——切断 \(X \to Z\) 的边

  • do-calculus 三规则(Pearl 1995, Ch 6):

  • 插入 / 删除观测\(P(y \mid do(x), z, w) = P(y \mid do(x), w)\)(若 \((Z \perp\!\!\!\perp Y \mid X, W)_{G_{\overline{X}}}\)
  • 干预 / 观测交换\(P(y \mid do(x), do(z), w) = P(y \mid do(x), z, w)\)(若 \((Z \perp\!\!\!\perp Y \mid X, W)_{G_{\overline{X}\underline{Z}}}\)
  • 插入 / 删除干预\(P(y \mid do(x), do(z), w) = P(y \mid do(x), w)\)(若 \((Z \perp\!\!\!\perp Y \mid X, W)_{G_{\overline{X}, \overline{Z(W)}}}\)

  • Back-door 准则:调整 \(X\)直接非因——若 \(Z\) 阻断所有从 \(X\)\(Y\) 的 back-door paths 且 \(Z\) 不含 \(X\) 的后代,则 $\(P(Y \mid do(X)) = \sum_z P(Y \mid X, Z=z) P(Z=z)\)$

  • Front-door 准则:用中间变量 \(M\)——若 (a) \(M\) 阻断 \(X \to Y\) 的所有有向路径;(b) 无 unblocked back-door from \(X\) to \(M\);(c) 所有 back-door from \(M\) to \(Y\)\(X\) 阻断。 $\(P(Y \mid do(X)) = \sum_m P(M=m \mid X) \sum_{x'} P(Y \mid M=m, X=x') P(X=x')\)$

元学习器(Meta-learners, Ch 9)

  • S-Learner(Single):用单一模型 \(\mu(x, t) = \mathbb{E}[Y \mid X=x, T=t]\)\(\hat{\tau}(x) = \hat{\mu}(x, 1) - \hat{\mu}(x, 0)\)
  • T-Learner(Two):分别训练 \(\mu_0(x) = \mathbb{E}[Y \mid X=x, T=0]\)\(\mu_1(x) = \mathbb{E}[Y \mid X=x, T=1]\)\(\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)\)
  • X-Learner(Künzel et al. 2019):用 imputed effects: $\(\hat{\tau}_0(x) = \mathbb{E}[Y(1) - \hat{\mu}_0(X) \mid X=x, T=1]\)$ $\(\hat{\tau}_1(x) = \mathbb{E}[Y(0) - \hat{\mu}_1(X) \mid X=x, T=0]\)$ $\(\hat{\tau}(x) = g(x) \hat{\tau}_0(x) + (1-g(x)) \hat{\tau}_1(x), \quad g(x) = \mathbb{E}[T=1 \mid X=x]\)$
  • R-Learner(Nie & Wager 2021):Robinson 分解 \(Y_i - \hat{m}(X_i) = \tau(X_i) (T_i - \hat{e}(X_i)) + \epsilon_i\),最小化 R-loss。
  • DR-Learner(AIPW-based, Ch 9):Doubly-Robust——结合 outcome regression + IPW: $\(\hat{\phi}(x) = \frac{1}{n} \sum_i \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) + \frac{T_i (Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)} - \frac{(1-T_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{e}(X_i)} \right]\)$

高级 Estimator(Ch 10)

  • DR (Doubly Robust)\(\hat{\tau}_{\text{DR}} = \frac{1}{n} \sum_i \left[ \frac{T_i (Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)} - \frac{(1-T_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{e}(X_i)} + \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) \right]\)
  • DML (Double ML, Chernozhukov et al. 2016):用 cross-fitting 估计 nuisance functions,避免正则化偏差。
  • TMLE (Targeted Maximum Likelihood, van der Laan & Rubin 2006):先估计 outcome model + treatment model,再用 targeted bias 校正。
  • Causal Forest (Wager & Athey 2018) / GRF (Athey-Tibshirani-Wager 2019):基于 Honest RF 的异质性处理效应估计。
  • DRIV (Doubly Robust IV):用 instrumental variable 处理 unobserved confounding。

深度学习因果(Ch 11)

  • TARNet(Shalit et al. 2017):shared representation + separate heads: $\(\Phi: X \to \mathbb{R}^d, \quad h_t(\Phi(X)) = \mathbb{E}[Y \mid X, T=t], \quad t \in \{0, 1\}\)$
  • CFR(Counterfactual Regression):加 IPM(Integral Probability Metric)正则化使 \(\Phi(X) \mid T=0 \approx \Phi(X) \mid T=1\)
  • SNet / FlexTENet / OffsetNet(Curth & van der Schaar 2021):多任务 + shared-specific 分解。
  • CausalBert(Veitch et al. 2020):把 Bert 作为 treatment 调节器估计 CATE。
  • Bayesian Synthetic Control(CausalPy / PyMC Labs):通过 Bayesian 时间序列模型 + 反事实预测做效应估计。

因果发现的四大家族(Ch 13)

  • PC 算法 5 步:(1) fully connected graph;(2) delete unconditional independent edges;(3) iterate conditional independent deletions;(4) orient colliders;(5) propagate orientations。
  • GES 算法(Chickering 2003):两阶段 greedy search —— forward + backward,使用 BIC / BDeu 评分。
  • ANM(Hoyer et al. 2008):\(Y = f(X) + \epsilon\)\(\epsilon \perp\!\!\!\perp X\);用 HSIC 检验 independence。
  • LiNGAM(Shimizu et al. 2006):\(Y = aX + \epsilon\)\(\epsilon\) 非高斯;用 ICA 恢复方向。
  • NOTEARS(Zheng et al. 2018):DAG-ness constraint $\(\mathcal{R}(A) = \mathrm{tr}(e^{A \odot A}) - d = 0 \iff A \text{ is DAG}\)$
  • augmented Lagrangian 优化。
  • GOLEM(Ng et al. 2020):likelihood objective + soft DAG constraint。

高级因果发现(Ch 14)

  • DECI(Geffner et al. 2022):end-to-end Bayesian 因果发现 + ICGNN 拟合非线性 + Gumbel-Softmax 离散采样。
  • FCI(Spirtes et al. 2000):4 种 edge type —— directed / o-> / o-o / ---(隐变量)。
  • ENCO(Lippe et al. 2022):边存在 + 方向独立参数化;全干预下保证收敛。
  • ABCI(Toth et al. 2022):active experimentation + sequential causal query。
  • CCANM / CORTH:cascade mediation / causal feature selection 的小众方法。

商业与项目方法(Ch 15)

  • 5 步项目法(Molak 工程化清单):
  • Starting with a question\(\mathcal{Q} = \{Q_1, \ldots, Q_K\}\)\(K \le 2\)
  • Obtaining expert knowledge\(\mathcal{E} = \{(e_i, c_i)\}\)\(c_i \in [0,1]\)
  • Generating hypothetical graph(s)\(G = (V, E)\)
  • Check identifiability\(\Pr(\text{identifiable}) = 1\)
  • Falsifying hypotheses:Popperian falsification。
  • 失败概率(隐式):\(p_f(\mathcal{Q}) = 1 - \exp(-\alpha |\mathcal{Q}|)\)
  • CATE 信息增益\(I_{\text{gain}} = \mathrm{Var}(\tau(X))\)

关键结论

  • 因果方法不是替代 ML,而是补充 ML——ML 适合"预测",因果方法适合"干预"与"反事实"。Molak 反复强调:不要把 ML 当因果,不要把因果当 ML
  • DoWhy 4-step + EconML API是当前因果推断的事实标准——model → identify → estimate → refute。生产中:所有因果项目遵循此框架。
  • 5 大 Meta-Learner 各有适用场景:S-Learner 简单但 biased;T-Learner 平衡;X-Learner 适合 imbalanced data;R-Learner 理论最严谨;DR-Learner 最 robust。生产建议:在样本不平衡 / 高维特征下用 X-Learner 或 DR-Learner。
  • DR / DML / TMLE / Causal Forest 是 4 大高级 estimator——DR 简单 + robust;DML 理论严谨;TMLE 高效;Causal Forest 异质性。生产建议:用 DML 作为 baseline + Causal Forest 做异质性分析。
  • TARNet / SNet / CausalBert 是 DL 因果的"主力"——TARNet 入门;SNet 多任务;CausalBert 处理文本 confounders。生产建议:TARNet 是 default;高维 / 异质任务用 SNet;NLP 场景必用 CausalBert。
  • Causal discovery 的 4 大家族没有"万能算法"——PC 简单、LiNGAM 非高斯场景、NOTEARS 连续优化、DECI 端到端。生产中必须 multi-algorithm + consensus edges
  • Hybrid methods 是工程化金标准——专家 + 算法 + 实验三件套。gCastle 的 PrioriKnowledge + Causica 的 ExpertGraphContainer是当前最佳工程化实践。
  • Hidden confounding 处理:FCI 是唯一在 causal sufficiency 违反时仍渐近正确的 constraint-based 算法;DECI 不支持 hidden conf;ENCO 可扩展。生产中:先用 FCI 评估是否需要 hidden conf 处理。
  • 5 步项目法是工程化最实用的成果——Molak 强调"问题定义 + 5 步迭代"是因果项目成功关键。
  • Causal data fusion 是工业级机会——把多源数据融合做因果推断,特别适合生物医学。
  • LLM 不是"真因果"——Willig et al. 2023 "Causal Parrots" 警告:LLM 学的是 associational meta-SCM。生产中:LLM 推理必须用干预数据校验。
  • 6 大核心库
  • DoWhy(Microsoft, Sharma & Kiciman 2020)——4-step 框架 + EconML 集成。
  • EconML(Microsoft, Battocchi et al. 2019)——DML / DML-IV / DRIV / CausalForestDML。
  • CausalImpact / CausalPy(PyMC Labs)——Bayesian Synthetic Control。
  • CATENets(Curth & van der Schaar 2021)——TARNet / SNet / FlexTENet / OffsetNet。
  • gCastle(Huawei Noah's Ark, Zhang et al. 2021)——PC / GES / LiNGAM / NOTEARS / GOLEM。
  • causal-learn(CMU CLeaR Group)——FCI / RFCI / tiered knowledge。
  • Causica(Microsoft, 2023)——DECI + ExpertGraphContainer。
  • transformers(HuggingFace)——CausalBert(Veitch et al. 2020)。

跨章节主题与工程化洞察

  • 因果 vs 关联的根本差异:Molak 反复强调的 Pearl 阶梯——L1 (association) vs L2 (intervention) vs L3 (counterfactual)。工程化:永远先问"问题在哪一层"——错层问题不可解。
  • DAG 是因果推理的"通用语言"——Ch 4–5 介绍 DAG + d-separation,Ch 6 do-calculus,Ch 7 DoWhy,Ch 13–14 causal discovery。生产中:把 DAG 作为"团队沟通的通用图"。
  • 假设检验是因果项目的"质量保证"——Ch 8 4 大假设、Ch 10 DML 假设、Ch 13 faithfulness、Ch 14 causal sufficiency。生产中:把"假设检验"做成 checklist。
  • 算法不是银弹——组合方法(ensemble)是金标准——Ch 9 多 meta-learner、Ch 10 多 estimator、Ch 13 多 discovery 算法。生产建议:任何因果估计跑 2–3 种算法 + 报告 mean ± std。
  • 专家知识 + 数据驱动 = 工程化突破——Ch 13 gCastle PrioriKnowledge、Ch 14 DECI ExpertGraphContainer。生产建议:把"专家图 + 数据图"对比——差异部分 = 关键审查点。
  • 真实数据 vs 合成数据的鸿沟——Ch 13 Reisach 2021、Ch 14 真实 benchmark 案例。生产中:永远在真实数据 holdout 上验证——不信任 synthetic F1。
  • 5 步项目法是"道"层面的核心——Molak 反复强调问题定义 + 迭代。生产中:5 步做成团队 SOP。
  • LLM + 因果是未来 5 年的核心方向——Kıcıman 2023 / Willig 2023 / "Causally aware imitation learning"。

挑战和开放性问题(跨章节)

  • Causal sufficiency 假设的可证伪性:所有 causal discovery 算法都"假设"无 hidden confounding,但没有统计 test 能可靠检测生产 fallback:Cinelli & Hazlett 2020 sensitivity analysis。
  • Observational equivalence(CPDAG vs DAG):PC / GES 只返回 MEC——若需唯一 DAG,必须借助干预数据生产中:active experimentation + algorithm 的组合。
  • Hyperparameter 敏感性:Huang et al. 2021 Arctic Sea Ice 实验显示 NOTEARS / DAG-GNN 对超参敏感。生产建议:multi-seed + 报告 mean ± std。
  • 样本量与维度的"trade-off":D > 50 时 PC / GES 计算爆炸;CCANM 需要 5000–6000 obs。生产中:先做 feature selection + bootstrap。
  • 混合变量类型:除 ENCO 外,多数算法不支持混合变量。生产中:one-hot encoding 会破坏"无序类别"语义——需用专门的混合算法。
  • 真实数据 benchmark 不足:Tu 2019 / Huang 2021 / Shen 2020 是少数公开 case。生产中:鼓励团队贡献 case study。
  • LLM "假因果"风险:Willig 2023 警告 LLM 的 do-inference 偏离真分布。生产中:必须 ground 到真实干预数据。
  • "完全自动化科学家"尚未实现:Molak 在 Ch 15 强调"we still haven't reached the stage of a fully automated scientist"——agents + experiments + causal reasoning 是未来 5–10 年方向。

个人研究的关联思考

  • Molak 强调的"3 大技能"(Ch 15):Ladder of Causation awareness + 结构化思维 + CATE modeling——这与生物力学 + 临床决策的工程化高度相关。
  • 5 步项目法 + Six Sigma 是"工业级沟通语言"——可向临床医生 + 工程师讲因果。
  • Causal data fusion适合动物实验 + 临床数据融合——这是血管生物力学的核心场景。
  • Synthetic control(CausalPy 实战, Ch 11)适合"单病例 / 小队列"医学研究——这与我的研究有直接关联。
  • TARNet / SNet 在医学上——可作为"个体化治疗效应估计"工具。
  • DECI 的 end-to-end 框架 + 医学干预数据 是未来 5 年的临床决策支持系统方向。
  • Willig 2023 "Causal Parrots"的启示——LLM 不能替代专家——临床决策必须人机协同。

重要参考文献(综合)

Part 1:因果基础

  • Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press.
  • Pearl, J., & Mackenzie, D. (2019). The Book of Why. Penguin Books.
  • Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of Causal Inference. MIT Press.
  • Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
  • Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.

Part 2:因果推断

  • Neyman, J. (1923). On the Application of Probability Theory to Agricultural Experiments. Statistical Science, 5(4), 465-480.
  • Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. JEP, 66(5), 688-701.
  • Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. JASA, 89(427), 846-866.
  • Chernozhukov, V., et al. (2016). Double/Debiased Machine Learning for Treatment and Causal Parameters. arXiv.
  • Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. JASA, 113(523), 1228-1242.
  • Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148-1178.
  • van der Laan, M. J., & Rubin, D. (2006). Targeted Maximum Likelihood Learning. U.C. Berkeley.
  • Künzel, S. R., et al. (2019). Meta-learners for Estimating Heterogeneous Treatment Effects. PNAS.
  • Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating Individual Treatment Effect: Generalization Bounds and Algorithms. ICML.
  • Curth, A., & van der Schaar, M. (2021). Nonparametric Estimation of Heterogeneous Treatment Effects. AISTATS / NeurIPS.
  • Veitch, V., Sridhar, D., & Blei, D. M. (2020). Adapting Text Embeddings for Causal Inference. UAI.
  • Abadie, A., & Gardeazabal, J. (2003). The Economic Costs of Conflict. AER.
  • Abadie, A. (2021). Using Synthetic Controls. AEA.
  • Kıcıman, E., Ness, R., Sharma, A., & Tan, C. (2023). Causal Reasoning and Large Language Models. arXiv.
  • Sharma, A., & Kiciman, E. (2020). DoWhy: An End-to-End Library for Causal Inference. arXiv.
  • Battocchi, K., et al. (2019). EconML: A Python Package for Causal Machine Learning. arXiv.

Part 3:因果发现

  • Rebane, G., & Pearl, J. (1987). The Recovery of Causal Poly-trees from Statistical Data. IJAR.
  • Verma, T., & Pearl, J. (1990). Equivalence and Synthesis of Causal Models. UAI 1990.
  • Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian Networks. Machine Learning, 20, 197-243.
  • Chickering, D. M. (2003). Optimal Structure Identification with Greedy Search. JMLR, 3, 507-554.
  • Shimizu, S., et al. (2006). A Linear Non-Gaussian Acyclic Model for Causal Discovery. JMLR, 7, 2003-2030.
  • Shimizu, S., et al. (2011). DirectLiNGAM. JMLR, 12, 1225-1248.
  • Hoyer, P. O., et al. (2008). Nonlinear Causal Discovery with Additive Noise Models. NeurIPS 21.
  • Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent Component Analysis. Wiley.
  • Zheng, X., Aragam, B., Ravikumar, P., & Xing, E. P. (2018). DAGs with NO TEARS. NeurIPS 2018.
  • Ng, I., Ghassami, A., & Zhang, K. (2020). On the Role of Sparsity and DAG Constraints for Learning Linear DAGs. arXiv.
  • Geffner, T., et al. (2022). Deep End-to-end Causal Inference. arXiv.
  • Spirtes, P., et al. (2013). Causal Inference in the Presence of Latent Variables and Selection Bias. arXiv.
  • Lippe, P., Cohen, T., & Gavves, E. (2022). Efficient Neural Causal Discovery without Acyclicity Constraints. arXiv.
  • Toth, C., et al. (2022). Active Bayesian Causal Inference. arXiv.
  • Zhang, K., et al. (2021). gCastle: A Python Toolbox for Causal Discovery. arXiv.
  • Park, J., Song, C., & Park, J. (2022). Input Convex Graph Neural Networks. OpenReview.
  • Reisach, A. G., Seiler, C., & Weichwald, S. (2021). Beware of the Simulated DAG! Varsortability in Additive Noise Models. arXiv.
  • Kaiser, M., & Sipos, M. (2021). Unsuitability of NOTEARS for Causal Graph Discovery. arXiv.
  • Cai, R., et al. (2021). Causal Discovery with Cascade Nonlinear Additive Noise Models. ACM TIST.
  • Soleymani, A., et al. (2022). Causal Feature Selection via Orthogonal Search. TMLR.
  • Tu, R., et al. (2019). Neuropathic Pain Diagnosis Simulator. NeurIPS 32.
  • Huang, Y., et al. (2021). Benchmarking of Data-Driven Causality Discovery Approaches. Frontiers in Big Data, 4.
  • Shen, X., et al. (2020). Causal Discovery Algorithms: Application to Alzheimer's Pathophysiology. Scientific Reports, 10(1), 2975.
  • Andrews, B., Spirtes, P., & Cooper, G. F. (2020). On the Completeness of Causal Discovery with Tiered Background Knowledge. AISTATS.
  • Kaddour, J., et al. (2022). Causal Machine Learning: A Survey and Open Problems. arXiv.
  • Deng, Z., et al. (2022). Deep Causal Learning. arXiv.
  • Vowels, M. J., et al. (2022). D'ya Like DAGs? ACM Computing Surveys, 55(4), 1-36.
  • Berrevoets, J., et al. (2023). Causal Deep Learning. arXiv.
  • Bareinboim, E., & Pearl, J. (2016). Causal Inference and the Data-Fusion Problem. PNAS, 113(27), 7345-7352.
  • Hünermund, P., & Bareinboim, E. (2023). Causal Inference and Data Fusion in Econometrics. arXiv.
  • Schölkopf, B., et al. (2021). Toward Causal Representation Learning. Proceedings of the IEEE, 109(5), 612-634.
  • Willig, M., et al. (2023). Causal Parrots: Large Language Models May Talk Causality But Are Not Causal. ACM preprint.
  • Curth, A., et al. (2021). Really Doing Great at Estimating CATE? NeurIPS Datasets & Benchmarks.
  • Jeunen, O., et al. (2022). Disentangling Causal Effects from Sets of Interventions. NeurIPS 35.
  • Becker, J. M. (2016). The Book of Why (statistical control 批判).
  • Becker, M., et al. (2021). RealCause: Realistic Causal Inference Benchmarking. arXiv.
  • Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity. JRSS B, 81(1), 39-67.
  • Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Fisher, R. A. (1921). On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Metron, 1, 1-32.
  • Popper, K. (1959). The Logic of Scientific Discovery. Basic Books.
  • Popper, K. (1971). Conjectural Knowledge. Revue Internationale de Philosophie, 25(95/96), 167-197.
  • Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
  • Tetlock, P. E. (2005). Expert Political Judgment. Princeton University Press.
  • Tetlock, P. E., & Gardner, D. (2015). Superforecasting. Crown.
  • Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
  • LaLonde, R. J. (1986). Evaluating the Econometric Evaluations of Training Programs. AER.
  • Hill, J. L. (2011). Bayesian Nonparametric Modeling for Causal Inference. JCGS, 20(1), 217-240.
  • Hernán, M. A., & Robins, J. M. (2006). Instruments for Causal Inference. Epidemiology, 17(4), 360-372.
  • Imbens, G. W. (2004). Nonparametric Estimation of Average Treatment Effects. RES, 86(1), 4-29.
  • Hurwitz, J., & Thompson, J. K. (2023). Causal Artificial Intelligence. (Q4 2023).
  • Bareinboim, E. (2020). Causal Reinforcement Learning tutorial. (https://bit.ly/CausalRL).
  • Lippe, P., et al. (2022). CITRIS: Causal Identifiability from Temporal Intervened Sequences. ICML 2022.
  • Chau, S. L., et al. (2021). BayesIMP: Uncertainty Quantification for Causal Data Fusion. NeurIPS 34.
  • Gallea, Q. The Causal Mindset (in preparation).
  • Stahl, A. E., & Feigenson, L. (2015). Observing the Unexpected Enhances Infants' Learning. Science, 348(6230), 91-94.
  • Gopnik, A. (2009). The Philosophical Baby. Farrar, Straus and Giroux.
  • Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. Frontiers in Genetics, 10, 524.
  • Spirtes, P. (2010). Introduction to Causal Inference. JMLR, 11, 1643-1662.
  • Bareinboim, E., Correa, J. D., Ibeling, D., & Icard, T. (2020). On Pearl's Hierarchy and the Foundations of Causal Inference. ACM TEAC.
  • Cheng, L., et al. (2022). Data Fusion via Hard and Soft Constraints.
  • Hill, J. (2011). BART for Causal Inference.
  • Wager, S. (2020). Causal Inference: A Statistical Learning Approach (教材).
  • Neal Brady's video on faithfulness (https://bit.ly/BradyFaithfulness).
  • CD-Tribe (CDT) Python package (https://bit.ly/CDTMetricsDocs).
  • Neo4j, Amazon Neptune(图数据库).
  • grapl-causal 库 (https://bit.ly/GRAPLCausal).
  • Playtika uplift-analysis 库 (https://bit.ly/PlaytikaUplift).
  • Spotify technical blog (https://bit.ly/SpotifySynthControl, https://bit.ly/SpotifyHiddenBlog).
  • Six Sigma / DMAIC 框架.
  • TensorCell (https://bit.ly/TensorCell) — 交通仿真案例.
  • Pearl's Hierarchy and the Foundations of Causal Inference (Bareinboim et al. 2020).
  • Ch 11 Bayesian Structural Time Series (BSTS) — Scott & Varian 2014.
  • 8 schools example — Rubin 1981.
  • CLeaR 2023's Call for Causal Datasets (https://bit.ly/CLeaRDatasets).
  • Diagnosis Simulator (https://bit.ly/DiagnosisSimulator).
  • Causal Python newsletter (https://bit.ly/CausalPython).
  • Six books blog (https://bit.ly/SixBooksBlog).
  • Augmented Lagrangian method (Nemirovski 1999).
  • Frobenius norm (https://bit.ly/MatrixNorm).
  • AugmentedLagrangian optimizer (https://bit.ly/AugmentedLagrangian).
  • PyTorch Lightning (https://bit.ly/IntroToLightning).
  • Microsoft Causica (https://bit.ly/MicrosoftCausica).
  • ENCO GitHub (https://bit.ly/EncoGitHub).
  • ABCI GitHub (https://bit.ly/ABCIGitHub).
  • CLeaR Group (CMU) — causal-learn 维护者.
  • Maschinensucher (TensorCell) — 流量仿真.
  • Maahid Photos, Diego Ferrari, Max Avans — 图像引用(Pexels.com).
  • Pew Research Center (2008). Men or Women: Who's the Better Leader?
  • Zenger, J., & Folkman, J. (2020). Research: Women Are Better Leaders During a Crisis. HBR.
  • Kostis, J. B., & Dobrzynski, J. M. (2020). Limitations of Randomized Clinical Trials. AJC, 129, 109-115.
  • Harrell, F. (2023). Randomized Clinical Trials Do Not Mimic Clinical Practice. Statistical Thinking.
  • Senn, S. S. (2020, 2021). Randomization works.
  • Hall, N. S. (2007). R. A. Fisher and His Advocacy of Randomization. JHB, 40(2), 295-325.
  • Wu, S., et al. (2023). Causal Inference in Observational Data.
  • Muldoon, S., et al. — 复杂网络案例.
  • Smaldino, P. — replication crisis 相关.
  • Kahneman's "Thinking, Fast and Slow" 引用.
  • Lacerda, G., Spirtes, P. L., Ramsey, J., & Hoyer, P. O. (2008). Discovering Cyclic Causal Models by ICA. UAI 2008.
  • Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). Max-min Hill-climbing BN Structure Learning. Machine Learning, 65(1), 31-78.
  • Cressie, N., & Read, T. R. (1984). Multinomial Goodness-of-Fit Tests. JRSS B, 46(3), 440-464.
  • Colombo, D., & Maathuis, M. H. (2012). Order-Independent Constraint-Based Causal Structure Learning. JMLR, 15, 3741-3782.
  • Le, T. D., et al. (2015). A Fast PC Algorithm for High Dimensional Causal Discovery. IEEE/ACM TCBB, 16, 1483-1495.
  • Peters, J., & Bühlmann, P. (2015). Structural Intervention Distance. Neural Computation, 27(3), 771-799.
  • Uhler, C., Raskutti, G., Bühlmann, P., & Yu, B. (2013). Geometry of the Faithfulness Assumption. Annals of Statistics, 436-463.
  • Erdős, P., & Rényi, A. (1959). On Random Graphs I. Publicationes Mathematicae Debrecen, 6, 290-297.
  • Barabási, A. L. (2009). Scale-Free Networks. Science, 325(5939), 412-413.
  • Barabási, A. L., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science, 286(5439), 509-512.
  • McLachlan, G. J., Lee, S. X., & Rathnayake, S. I. (2019). Finite Mixture Models. Annual Review of Statistics, 6(1), 355-378.
  • Kingma, D. P., et al. (2016). Improved Variational Inference with Inverse Autoregressive Flow. NeurIPS 29.
  • Goodfellow, I., et al. (2020). Generative Adversarial Networks. CACM, 63(11), 139-144.
  • Kalainathan, D., et al. (2022). Structural Agnostic Modeling. arXiv.
  • Goudet, O., et al. (2018). Causal Generative Neural Networks. arXiv.
  • Khemakhem, I., et al. (2021). Causal Autoregressive Flows. AISTATS 2021.
  • Zhang, K., et al. (2012). Kernel-based Conditional Independence Test. arXiv.
  • Blake, W. (1794/2009). The Tyger. Songs of Experience.
  • Grammer, K., & Thornhill, R. (1994). Human Facial Attractiveness and Sexual Selection. JCP, 108(3), 233-242.
  • Johnston, I. G., et al. (2022). Symmetry and Simplicity Spontaneously Emerge. PNAS, 119(11), e2113883119.
  • Enquist, M., & Arak, A. (1994). Symmetry, Beauty and Evolution. Nature, 372, 169-172.
  • Martín, F. M. (2009). The Thermodynamics of Human Reaction Times. arXiv:0908.3170.
  • Muenssinger, J., et al. (2013). Auditory Habituation in the Fetus and Neonate. Developmental Science, 16(2), 287-295.
  • Rosenberg, A., & McIntyre, L. (2020). Philosophy of Science: A Contemporary Introduction (4th ed.). Routledge.
  • Wikipedia, S. (2023). The Book of Why (work-in-progress reference).
  • Kim, J., et al. (2023). Causal Inference for Survival Analysis.
  • 24 作者(PCA 引用,未列)。
  • Nie, X., & Wager, S. (2021). Quasi-Oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2), 299-319.
  • Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian Additive Regression Trees. Annals of Applied Statistics, 4(1), 266-298.
  • Dorie, V., et al. (2019). Automated versus Do-It-Yourself Methods for Causal Inference. Statistical Science, 34(1), 97-118.
  • Baham, A., et al. (2023). Causal Inference with ML.
  • Fergusson, C. (2017). Causal Inference in Observational Studies.
  • O'Neill, B. (2021). The Book of Why 续作。
  • Wittgenstein, L. (1953). Philosophical Investigations(Ch 12 哲学暗示)。
  • Tufte, E. (2006). The Cognitive Style of PowerPoint(Ch 11 风格呼应)。
  • Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124-1131.
  • Bach, F. (2020). Learning Theory from First Principles(Ch 4 引用隐式)。
  • Boyd, S., & Vandenberghe, L. (2004). Convex Optimization(Ch 13 NOTEARS 引用)。

个人反思与批判性分析

本书是因果推断 / 因果发现领域近年来最"工程友好"的 Python 实战指南——它在三个层面做出独特贡献:

  • 方法论层面的"哲学统一"——Molak 把 Pearl 的因果图框架、Neyman-Rubin 的潜在结果框架、Peters 的因果发现框架用 Python 工具链统一为可执行的工程化方法。这与 Pearl (2009)、Peters et al. (2017) 等理论教材形成互补——本书不教你"为什么",而是教你"怎么做"
  • 库生态层面的"系统化整合"——DoWhy(4-step)、EconML(DML/GRF)、gCastle(discovery)、Causica(DECI)、causal-learn(FCI)、CausalBert(NLP)——本书展示了这些库如何"组合"使用,而不是"挑选"生产中:团队的"因果工具箱"应包含全部 6 个库——按场景选用。
  • 商业落地层面的"真实案例"——Geminos × Company M、causaLens、Playtika、Spotify——4 个案例横跨制造 / 综合平台 / 数字娱乐 / 流媒体——证明因果方法不是学术——是工程生产启示:因果方法在工业优化、A/B test 替代、CATE 个性化等场景有真实价值。

值得深入讨论的几个层面:

  • Molak 的"工程师-哲学家"双重身份——他既是 TensorCell 等项目的实际 ML 工程师,又在哲学层面对因果方法有深入思考(Popper、Kahneman、Tetlock 引用)。这与一般纯学术作者(如 Pearl、Peters)有显著差异——他更关注"做出来"而非"证明"生产中:选 Molak 这本书作为工程化教材,比纯学术教材更实用。
  • "5 步项目法"是工程化金标准——Molak 在 Ch 15 强调"问题定义 + 专家 + 图 + 可识别性 + 证伪"——这一框架与 Pearl 的"do-calculus"框架完全一致,但更"工程友好"。生产建议:把 5 步做成"项目 onboarding SOP"——比任何算法都重要。
  • "3 大技能"的可操作化:Molak 强调 Ladder of Causation awareness + 结构化思维 + CATE modeling。我的实操:(a) 项目立项时明确问题在哪一层(L1/L2/L3);(b) 用 DAG 思维看 d-separation;(c) 在 A/B test 基础上加 CATE 分析。
  • "Hybrid methods"是工程化突破——专家 + 算法 + 实验——Molak 在 Ch 12、13、14、15 反复强调。生产中:(a) gCastle PrioriKnowledge + Causica ExpertGraphContainer 是工程化最强工具;(b) 任何 causal discovery 项目必须有"专家图"输入;(c) 多算法 consensus 是金标准。
  • 真实数据 vs 合成数据的鸿沟:Molak 在 Ch 13、14、15 反复强调 Reisach 2021 等"反 benchmark"工作。生产中:(a) 永远在真实数据 holdout 上验证;(b) 不信任 synthetic F1 > 0.9 的报告;(c) 准备好接受"F1 = 0.3"的真实结果。
  • LLM + 因果是未来 5 年核心方向:Molak 在 Ch 15 预言 agent + 因果 + experiments = 自动化科学家。生产中:(a) 关注 LangChain / LlamaIndex 等 agent 框架与 DoWhy / EconML 的集成;(b) 用 LLM 做"假设生成"+ DoWhy/EconML 做"假设验证";(c) 永远不要把 LLM 的"因果推理"输出作为最终答案——必须 ground 到真实数据。
  • 6 大核心库的选择策略
  • DoWhy + EconML:因果推断默认栈。
  • gCastle:因果发现入门——PC / GES / LiNGAM / NOTEARS。
  • Causica:DECI——端到端 + 深度学习。
  • causal-learn:FCI——hidden conf。
  • ENCO(自实现)——干预数据。
  • ABCI(自实现)——active experimentation。
  • CausalPy / CausalImpact:Bayesian Synthetic Control。
  • CATENets:深度学习因果(SNet / FlexTENet)。
  • transformers + CausalBert:NLP 因果。
  • 对本人的研究启发
  • 5 步法 + Six Sigma 是向"工业 / 临床"团队讲因果的最佳语言。
  • Causal data fusion 适合"动物实验 + 临床数据"融合——是血管生物力学的核心方向。
  • Bayesian Synthetic Control 适合"单病例 / 小队列"医学研究。
  • TARNet / SNet 在医学上——个体化治疗效应估计。
  • DECI end-to-end + 医学干预数据是临床决策支持系统的方向。
  • Willig 2023 "Causal Parrots" 启示——LLM 不能替代专家——临床决策必须人机协同。
  • 本书的整体评价:(a) 工程化最实用的因果教材之一;(b) 库选择最现代(DoWhy / EconML / gCastle / Causica / causal-learn);(c) 商业案例最丰富(4 个真实案例);(d) 未来方向最前沿(causal data fusion / agents / structure learning / imitation learning)。唯一不足:深度学习因果(TARNet / SNet / CausalBert)的实现细节较少,Ch 11 偏概念。但作为"入门 + 实践 + 方向"三位一体的教材,本书是首选

重要参考文献

本节列出与全本总结相关的"全集"核心文献(按主题分类),详细引用见各章节文件:

理论教材

  • [X1] Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press.
  • [X2] Pearl, J., & Mackenzie, D. (2019). The Book of Why. Penguin Books.
  • [X3] Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of Causal Inference. MIT Press.
  • [X4] Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
  • [X5] Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.
  • [X6] Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.

经典论文

  • [X7] Neyman, J. (1923). On the Application of Probability Theory to Agricultural Experiments.
  • [X8] Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.
  • [X9] Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.
  • [X10] Rebane, G., & Pearl, J. (1987). The Recovery of Causal Poly-trees from Statistical Data.
  • [X11] Shimizu, S., et al. (2006). A Linear Non-Gaussian Acyclic Model for Causal Discovery.
  • [X12] Hoyer, P. O., et al. (2008). Nonlinear Causal Discovery with Additive Noise Models.
  • [X13] van der Laan, M. J., & Rubin, D. (2006). Targeted Maximum Likelihood Learning.
  • [X14] Chernozhukov, V., et al. (2016). Double/Debiased Machine Learning for Treatment and Causal Parameters.
  • [X15] Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.
  • [X16] Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests.
  • [X17] Zheng, X., et al. (2018). DAGs with NO TEARS.
  • [X18] Künzel, S. R., et al. (2019). Meta-learners for Estimating Heterogeneous Treatment Effects.
  • [X19] Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating Individual Treatment Effect: Generalization Bounds and Algorithms.
  • [X20] Geffner, T., et al. (2022). Deep End-to-end Causal Inference.

工业案例

  • [X21] Geminos × Company M 工业制造案例(Ch 15)。
  • [X22] causaLens 综合平台(Ch 15)。
  • [X23] Playtika uplift 优化(Ch 15)。
  • [X24] Spotify 合成控制 + hidden conf (Jeunen et al. 2022)。
  • [X25] Six Sigma / DMAIC 流程改进框架。

库与工具

  • [X26] Sharma, A., & Kiciman, E. (2020). DoWhy.
  • [X27] Battocchi, K., et al. (2019). EconML.
  • [X28] Zhang, K., et al. (2021). gCastle.
  • [X29] Park, J., et al. (2022). Causica / DECI.
  • [X30] CLeaR Group. causal-learn.
  • [X31] Veitch, V., Sridhar, D., & Blei, D. M. (2020). CausalBert.
  • [X32] PyMC Labs. CausalPy.
  • [X33] Curth, A., & van der Schaar, M. (2021). CATENets.

未来方向

  • [X34] Bareinboim, E., & Pearl, J. (2016). Causal Data Fusion.
  • [X35] Schölkopf, B., et al. (2021). Toward Causal Representation Learning.
  • [X36] Lippe, P., et al. (2022). CITRIS.
  • [X37] Toth, C., et al. (2022). ABCI.
  • [X38] Kıcıman, E., et al. (2023). LLMs + Causality.
  • [X39] Willig, M., et al. (2023). Causal Parrots.

5 步项目法

  • [X40] Molak, A. (2023). Ch 15 of Causal Inference and Discovery in Python. Packt.