第10章：Causal Inference and Machine Learning – Advanced Estimators, Experiments, Evaluations, and More

作者

本章由 Aleksander Molak 撰写。涉及的核心方法参考 Chernozhukov et al. (2016) DML 原始论文、Chernozhukov et al. (2022) Long Story Short 偏误界、Wager & Athey (2018) Causal Forests 原始论文、van der Laan & Rubin (2006) TMLE 原始论文、Robins et al. (1994) DR 思想起源。本章是 Part 2 的"estimator 工具箱"——把 Ch 9 的 S/T/X meta-learners 升级为"在 medical 期刊上能发表的"工业级方法。

内容概述

本章是 Part 2 工具箱的"集大成"——四种高级 estimator 的理论 + 实践。内容分六块：（1）Doubly Robust (DR) estimator——结合 treatment + outcome 两个模型，只要其中一个正确就渐近无偏（bias ∝ product of model errors）；（2）TMLE（Targeted Maximum Likelihood Estimation）——semi-parametric + targeting step + valid confidence intervals；（3）DML（Double Machine Learning, Chernozhukov et al. 2016）——Frisch-Waugh-Lovell 定理 + cross-fitting + orthogonalization，$\sqrt{N}$ 一致性，是当前因果推断的"金标准"；（4）Causal Forests（Wager & Athey 2018）——honest splitting + causal tree splits + 高维友好；（5）多 level treatment + uplift 实验数据——machine learning earnings interaction dataset 上 7 个 estimator 的 MAPE 对比；（6）Counterfactual explanations——把因果模型的反事实能力用于 ML 解释。作者的最佳结果：Linear DML + tuning 在 LaLonde 数据上 MAPE = 0.17%，比 S-Learner 5.02% 好 30 倍。

核心方程与概念

DR estimator 的图形动机（图 10.1）：back-door criterion 可通过控制 $X$、控制 $e(X)$、或同时控制二者来 deconfound——三种方式"图等价但估计误差不同"。DR 巧妙地同时使用两个模型，自动选择误差更小者。
DR estimator 公式（Robins et al. 1994; Cassel et al. 1976 早期版本）： $$\hat{\tau}_{\text{ATE}}^{\text{DR}} = \frac{1}{N} \sum_i \left[ \frac{T_i(Y_i - \mu_1(X_i))}{\hat{e}(X_i)} + \mu_1(X_i) \right] - \frac{1}{N} \sum_i \left[ \frac{(1 - T_i)(Y_i - \mu_0(X_i))}{1 - \hat{e}(X_i)} + \mu_0(X_i) \right]$$ 其中 $\mu_t(X) = \mathbb{E}[Y | X, T=t]$ 是 outcome model，$\hat{e}(X)$ 是 propensity model。关键性质：bias = product of (treatment model error) × (outcome model error)——任一正确就无偏。
DR 的失败模式（Kang & Schafer 2007; Li 2021）：
当 outcome model 严重误设时，即使 treatment model 正确，DR 仍可能 bias 显著。
当两个 model 都中等程度误设时，DR 偏差 + 方差双高（Kang-Schafer 模拟）。
不是 bulletproof——但比单一 model 健壮。
DR 的"小方差"优势（Li 2021）：两个 model 都正确时，DR 的方差小于 IPW 和 S-Learner。理论（Courthoud 2022）：DR = S-Learner + 额外 correction term。
LinearDRLearner vs DRLearner（EconML）：
LinearDRLearner：final model 默认 linear regression——可与 LightGBM 等非线性 outcome model 配对，保留因果参数线性。作者数据上 MAPE = 0.62%。
DRLearner：final model 可选任意模型（默认也是 LGBM）。作者数据上 MAPE = 7.6%——过于复杂反而差。
SparseLinearDRLearner：L1 惩罚的 debiased lasso（Bühlmann & van de Geer 2011）——高维时适用，自动特征选择。
ForestDRLearner：Causal Forest 作为 final stage——高维非参数 + bootstrap confidence intervals（计算代价大）。
TMLE（van der Laan & Rubin 2006）：semi-parametric 方法，用 ML 算法 + 保留有效 confidence intervals。8 步实现：
拟合 $\mu$ 模型 $Y \sim X + T$。
预测 $\hat{y}_t = \mu(X, T=t)$、$\hat{y}_0 = \mu(X, T=0)$、$\hat{y}_1 = \mu(X, T=1)$。
估计 $\hat{e}(X) = P(T=1|X)$。
计算 clever covariate $H(T, X) = \frac{\mathbf{1}\{T=1\}}{\hat{e}(X)} - \frac{\mathbf{1}\{T=0\}}{1 - \hat{e}(X)}$。
拟合 fluctuation logistic：$Y \sim [-1 + \text{logit}(\hat{y}_t)] + H(T, X)$——固定 $\text{logit}(\hat{y}_t)$ 作为 offset，仅估计 $H$ 系数。
用 fluctuation 参数 + clever covariate 更新预测。
计算 ATE / ATT / CATE。
构造 valid CIs。
仅适用于 binary outcome（可扩展到 continuous, Gruber & van der Laan 2010）。
EconML 不直接支持——用 IBM causallib 或 zEpid。
DML（Double Machine Learning, Chernozhukov et al. 2016）：
核心思想：把传统 DML 与 ML 估计器结合，保留 $\sqrt{N}$ 一致性与 asymptotic normality——可构造有效 confidence intervals。
两个关键机制：
1. Cross-fitting（交叉拟合）：把数据分为 $K$ 折（$K=4$ 或 5）；在 $K-1$ 折上训练 nuisance model（$\mu$, $e$），在剩余 1 折上预测残差。这一步避免 overfitting bias——直接对全数据 fit+predict 会泄漏。
2. Orthogonalization（正交化）：灵感来自 Frisch-Waugh-Lovell 定理——先在 $X$ 上回归 $Y$ 拿残差 $\tilde{Y}$，再在 $X$ 上回归 $T$ 拿残差 $\tilde{T}$，最后回归 $\tilde{Y}$ on $\tilde{T}$。第一步残差化把"因果信号"从 nuisance 中剥离——保证最终估计对 nuisance model error 线性不敏感（不是二次方）。
数学骨架： $$\hat{\mu}(X), \hat{e}(X) \leftarrow \text{cross-fitted nuisance models}$$ $$\tilde{Y}_i = Y_i - \hat{\mu}(X_i), \quad \tilde{T}_i = T_i - \hat{e}(X_i)$$ $$\hat{\theta} = \frac{\sum_i \tilde{T}_i \tilde{Y}_i}{\sum_i \tilde{T}_i^2}$$
DML 在 EconML 中：
- backdoor.econml.dml.LinearDML：final model = linear regression，可与非线性 $\mu$、$e$ 配对。
- backdoor.econml.dml.CausalForestDML：Causal Forest 作为 final stage。
- backdoor.econml.dml.SparseLinearDML：L1 惩罚 + high-dim 友好。
DML 的 hyperparameter tuning（在作者的 LaLonde 数据上）：
用 sklearn.model_selection.GridSearchCV 包裹 model_y 和 model_t。
注意 GridSearchCV.cv（超参 tuning 折数）vs DML cv（cross-fitting 折数）是两回事——容易混淆。
作者配置：GridSearchCV 10-fold + DML cv=4 + LGBMRegressor/LGBMClassifier + discrete_treatment=True。
结果：MAPE = 0.17%——比 X-Learner 3.6% 好 20 倍，比 LinearDRLearner 0.62% 好 3.5 倍。
Twyman's law（Kohavi et al. 2020）："任何看起来很 impressive 的统计图最可能是错的"——作者用此提醒不要对单一 benchmark 过度自信。
DML 的"非银弹"局限：
Hidden confounding 仍导致严重 bias（Hünermund et al. 2022）——bias magnitude 与 LASSO 等简单方法相当。
Gordon et al. (2022) Facebook Ads 大规模 benchmark：DML 在高维非实验数据上没有显著优于传统方法。
DML 仍受 bad control schemes 影响（Cinelli et al. 2022）——加任意 features 不一定改进，可能打开非因果路径。
"加更多样本 + 加更多特征"的 ML 直觉在因果中失效：causal bias 来自 structural misspecification，不来自数据不足。
Causal Forest（Wager & Athey 2018; Athey et al. 2019）：
核心：与 Random Forest 类似（resampling + predictor subsetting + averaging），但 split criterion 基于估计的 treatment effect。
Honest splitting：用一部分数据生成 splits，另一部分（hold-out）估计 leaf values——逻辑类似 DML 的 cross-fitting。
非参数 + 高维友好 + valid confidence intervals。
EconML 类：CausalForestDML（DML orthogonalization + Causal Forest 集成）——推荐起点；CausalForest（grf 模块的 raw 版）——不估计 nuisance parameters，性能可能次优。
作者数据：Causal Forest MAPE = 4.6%（与 S/X-Learner 相当，但未调优）。
多 level treatment：作者用 EconML 的 CausalForestDML 处理 3 个 treatment levels（控制 + 2 种 drug）的多臂 bandit 数据。多 treatment 的 CATE 输出：每个 treatment vs control 的一对效应。
Uplift 实验数据的 evaluation metrics：
qini coefficient（Radcliffe 2007）：uplift 模型排序的累计增益。
auuc (Area Under Uplift Curve)：类似 AUROC。
DR 评估：在 uplift 排序前 30% 的"高 uplift"亚组中，treatment 实际 uplift 是否显著高于随机。
CATE 估计器 benchmark 总结（作者数据，Table 10.1）：

Estimator	MAPE
S-Learner	5.02%
T-Learner	8.13%
X-Learner	3.63%
Linear DR-Learner	0.62%
Linear DML	1.25%
Linear DML (tuning)	0.17%
Causal Forest	4.60%

注意：DR-Learner 比 DML 略好（0.62% vs 1.25%），但 DML + tuning 超过所有（0.17%）。T-Learner 最差（数据效率问题）。

DR vs DML 的工程选型（Huang et al. 2020, EconML 团队）：
DML 优势：连续 treatment 友好 + 高维稀疏友好 + lower variance on positivity 弱违反情形。
DR-Learner 优势：outcome model 严重误设时更稳 + categorical treatment 友好。
TMLE：与 DR-Learner 类似的渐近性质，valid CIs 不需 bootstrap。
作者建议：从 S-Learner 起步（计算便宜）；资源允许时用 DML/DR-Learner；始终有 simple model 作为 benchmark。
Counterfactual explanations（Wachter et al. 2017）：用因果模型的反事实能力解释 ML 预测——"如果 $X$ 改成 $X'$，$Y$ 预测会变成 $Y'$"——可作为"算法可解释性"的工具。

关键结论

DR 的"双重鲁棒"是 asymptotic property——只要 outcome 或 treatment model 之一正确，DR 就无偏（bias ∝ product of errors）。但：实际数据中两个都中等程度误设时，DR 偏差反而放大（Kang-Schafer 陷阱）。
TMLE 是"parametric + non-parametric"的混合——可用 ML 但保留 asymptotic inference 工具（valid CIs、hypothesis tests）。8 步实现 + clever covariate + fluctuation 是其核心创新。
DML 是当前因果 ML 的"金标准"——cross-fitting + orthogonalization 解决了传统 ML 估计器的 overfitting bias + 高维 nuisance 误差。Chernozhukov et al. 2016 的理论保证 $\sqrt{N}$ 一致性。
DML 不是银弹：Hünermund 2022 + Gordon 2022 在大规模实际数据上发现 DML 不显著优于简单方法；核心原因是 causal bias 来自结构错设，与数据量无关。生产中应把 DML 与 sensitivity analysis（Chernozhukov et al. 2022）配对使用。
Linear DML + tuning 是作者的最佳实践：用 GridSearchCV 包裹 outcome/treatment 模型 + DML 的 cross-fitting + LGBM——MAPE 0.17% 的成绩。
Causal Forest 是高维数据的首选——非参数 + valid CIs（bootstrap）+ 不依赖 linearity。但计算代价大（特别在高维 + 大样本）。
多 level treatment 是真实医学场景：cancer 多线治疗、药物剂量响应——EconML CausalForestDML 支持直接处理。

挑战和开放性问题

DML 在高维 confounders 上的稳定性：当 nuisance 模型的特征数 $p$ 接近或超过 $N$ 时，orthogonalization 的偏差仍可能爆炸。Chernozhukov et al. (2022) 的 sensitivity analysis 是部分解。
DR-Learner 的"过度调优"风险：作者用 LGBMRegressor + LGBMRegressor 的 DRLearner 表现 7.6%——比 LinearDRLearner 0.62% 差 12 倍。复杂模型 + 小数据 = 过拟合。生产中先从 LinearDRLearner 起步。
TMLE 在 EconML 中的缺失：作者明确指出 TMLE 不在 EconML——需要 IBM causallib 或 zEpid。这限制了 TMLE 在 DoWhy + EconML 生态中的"一站式"使用。
Causal Forest 的 bootstrap CI 计算代价：在高维 + 大样本下，bootstrap 数百次后单次估计时间可能小时级。生产中应限制 max_depth + n_estimators + 减少 cv 折数。
多 treatment 与 CATE 的元学习器：作者提了多 treatment 可扩展，但 Künzel et al. (2019) 的 X-Learner 多 treatment 变体需要 6+ 模型——工程复杂度爆炸。Lopez & Gutman (2017) 给出多 treatment 综述。
DR-Learner vs DML 的"经验选型"：作者给了一些 heuristic，但真实数据上没有金标准。最佳实践：跑多个方法 + 报告 uncertainty（confidence intervals + sensitivity bounds）。
Counterfactual explanations 在生产中的"稳定性"：Wachter et al. 2017 的"nearest counterfactual"在生产中对数据扰动极敏感——同一预测的反事实解释可能因数据微小变化而大幅改变。

个人反思与批判性分析

本章是 Part 2 的"工具箱"——DR、TMLE、DML、Causal Forest 四种高级 estimator 各有理论保证与实践约束。值得讨论的几个层面：

"DR = S-Learner + correction term"是 Courthoud 2022 的重要洞察：作者引用这一结果，但教学上很少强调——DR 不是"两个 model 简单加权"，而是有针对性的偏差修正。生产中理解这一点有助于在 DR-Learner 与 S-Learner 之间选择：如果 outcome model 误设风险高，DR 的修正项会偏向 treatment model；如果两者都可信，DR 收敛到 S-Learner + 减方差。
DML 的"理论保证"在实践中常被高估：Chernozhukov et al. 2016 的 $\sqrt{N}$ 一致性需要 (i) 真实的 confounder set；(ii) 良好的 cross-fitting；(iii) outcome/treatment model 的偏差在 $\sqrt{N}$ 速率下消失。当 (i) 不满足时——即存在 unobserved confounder——DML 仍是无偏的，但只对其他"控制集"是无偏，对真实 causal effect 可能完全无意义。Hünermund et al. 2022 + Gordon et al. 2022 的大规模实验清晰显示DML 的"理论美"与"工程实际"的差距。对生产团队的启示：不要因 DML "理论完备" 就忽略 causal identification 步骤（画图、识别 estimand、sensitivity analysis）。
LinearDML + tuning 的 0.17% MAPE 过于乐观：作者在 synthetic data 上获得，但真实数据上 DML 的优势通常更小。作者也提"Twyman's law" 提醒。生产中应跑多组 simulation + 多组 sensitivity——而不是用单一 benchmark 决定 estimator 选型。
DR-Learner 的"过拟合 vs 偏差"权衡：作者用 LGBMRegressor + LGBMRegressor 的 DRLearner 表现远差于 LinearDRLearner——这与"模型越复杂越好"的 ML 直觉矛盾。原因是 DRLearner 的 final model 在已 orthogonalized 的 residuals 上训练，LGBM 容易 memorize small-sample noise。生产中应优先用 LinearDRLearner + complex nuisance models（model_y, model_propensity）——final stage 保持简单。
Causal Forest 在作者数据上未调优却表现尚可（4.6% MAPE）——这与 Künzel et al. 2019 的 X-Learner (3.6%) 接近。Causal Forest 的优势在"自动发现异质性"——它通过 tree splits 找到"treatment effect 异质"的子组，不需要手动指定 effect modifiers。生产中 Causal Forest 应作为 "exploratory CATE" 工具——先用它看 treatment effect 在哪些子组不同，再用 meta-learner 精化。
Hyperparameter tuning 在 DML 中的"双重 CV"陷阱：GridSearchCV 的 cv 是 tuning 折数，DML 的 cv 是 cross-fitting 折数——两个独立。生产中应明确命名避免混淆（如 tuning_cv=10, dml_cv=4）。更稳健的做法：用 Optuna / Hyperopt 做 Bayesian optimization（不需要手动设 cv），然后用最优模型喂 DML——避免 sklearn GridSearchCV 的 subsetting 偏误。
"多 treatment 扩展"是 Lacker 排卵模型的关键：作者数据是 3 个 treatment levels。在医学中（cancer 多线治疗），多 treatment 是常态。EconML 的 CausalForestDML + multi_class_treatment=True 扩展支持离散多 treatment；连续多 treatment 仍是前沿。生产中处理多 treatment 的 trick：用 multinomial propensity + dummy treatment 编码，把 K 治疗问题转化为 K-1 个二元 treatment 问题。
TMLE 的"工程冷遇"与其"理论完备"形成对比：EconML 没集成；causallib / zEpid 的 TMLE 实现需要自己写 fluctuation step。生产中通常用 DR-Learner 或 DML 替代——其渐近性质与 TMLE 类似，但工程友好得多。
Counterfactual explanations 的"工程弱点"：Wachter et al. 2017 在生产中对模型不稳定极敏感——同一预测的反事实可能因数据微小变化而大幅不同。改进方向：DICE（DiCE, Mothilal et al. 2020）通过 diversity constraint + proximity loss 给出多个反事实，更稳定。但仍不是因果意义——反事实 ≠ 干预（Ch 2 的核心区别）。
对个人研究的启发：我在做血管生物力学时，DML 是最实用的因果估计器——能处理高维影像组学特征（$p > 100$）+ 提供 valid CIs。但第一步仍是画因果图 + 验证因果结构——DML 在错设的因果图上不会"自动"修正。生产流程建议：(1) 画因果图（专家知识 + 必要的因果发现）；(2) 应用 back-door / IV 识别 estimand；(3) 跑 LinearDML + hyperparameter tuning；(4) Chernozhukov sensitivity analysis；(5) 与领域专家讨论结果的临床意义。
作者的"CATE benchmark table 过于乐观"的诚实性：作者明示"we did not tune hyperparameters for the remaining methods"——这是对 Table 10.1 的重要免责声明。生产中应把多个 estimator + tuning + sensitivity 的结果都报告，而不是单一"赢家"。

重要参考文献

[X1] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2016). Double/Debiased Machine Learning for Treatment and Causal Parameters. arXiv:1608.00060 — DML 的开创性论文。
[X2] Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. JASA, 113(523), 1228–1242 — Causal Forest 原始论文。
[X3] van der Laan, M. J., & Rubin, D. (2006). Targeted Maximum Likelihood Learning. UAI 2006 — TMLE 原始论文。
[X4] Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors are not Always Observed. JASA, 89(427), 846–866 — DR 思想的现代起源（1994）。
[X5] Cassel, C. M., Särndal, C. E., & Wretman, J. H. (1976). Some Results on Generalized Difference Estimation and Generalized Regression Estimation for Finite Populations. Biometrika, 63(3), 615–620 — DR 思想的更早起源（1976）。
[X6] Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., & Syrgkanis, V. (2022). Long Story Short: Omitted Variable Bias in Causal Machine Learning (NBER WP 30302) — DML 在 unobserved confounder 下的 bias bounds。
[X7] Hünermund, P., Kaminski, J., & Schmitt, C. (2022). Causal Machine Learning and Its Use Cases. Causal AI Academy working paper — DML 在 hidden confounding 下表现平凡的实证研究。
[X8] Gordon, B. R., Moakler, R., & Zettelmeyer, F. (2022). Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement. arXiv:2201.07055 — DML 在 Facebook Ads 数据上的大规模 benchmark。
[X9] Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22(4), 523–539 — DR 偏差 + 方差双高的模拟研究。
[X10] Li, F. (2021). Statistical Methods for Causal Inference in Observational Studies with Random Forests. PhD thesis, Harvard University — DR 在 outcome model 误设下 bias 显著的实证。
[X11] Huang, J., et al. (2020). EconML User Guide. Microsoft Research — DRLearner vs DML 选型建议的原始资料。
[X12] Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148–1178 — Causal Forest 的 general 化扩展。
[X13] Hernan, M. A., & Robins, J. M. (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC — DR 与 DML 的统计学习理论。
[X14] Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias. JRSS B, 81(1), 39–67 — bad control schemes 的系统化分析。
[X15] Gruber, S., & van der Laan, M. J. (2010). A Targeted Maximum Likelihood Estimator of a Causal Effect on a Bounded Continuous Outcome. Int. J. Biostatistics, 6(1) — TMLE 扩展到连续 outcome。
[X16] Bühlmann, P., & van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer — debiased lasso 理论。
[X17] van der Geer, S., Bühlmann, P., Ritov, Y., & Dezeure, R. (2013). On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models. Annals of Statistics, 42(3) — high-dim CIs 理论。
[X18] Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology, 31(2) — Counterfactual explanations 的 GDPR 背景下的引入。
[X19] Porter, K. E., Gruber, S., van der Laan, M. J., & Sekhon, J. S. (2011). The Relative Performance of Targeted Maximum Likelihood Estimators. Int. J. Biostatistics, 7(1) — TMLE vs IPW 的 small-sample 比较。
[X20] Facure, M. (2020). Causal Inference for The Brave and True. https://matheusfacure.github.io/python-causality-handbook/ — DR estimator 详细推导。
[X21] Courthoud, M. (2022). Doubly Robust Estimation. https://matheusfacure.github.io/python-causality-handbook/ — DR = S-Learner + correction term 的来源。
[X22] Kohavi, R., et al. (2020). Twyman's Law and Other Lessons from a Career in Data and Decision Making. KDD 2020 keynote — Twyman's law 命名。
[X23] Tan, Z., et al. (2022). Distributional Sensitivity Analysis for Causal Inference. JASA — DR 在 sensitivity analysis 中的最新工作。
[X24] Balestriero, R., Pesenti, J., & LeCun, Y. (2021). Learning in High Dimension Always Amounts to Extrapolation. arXiv:2110.09485 — DML 在高维时的外推风险。
[X25] Xu, K., et al. (2021). How Neural Networks Extrapolate. arXiv:2009.11848 — 神经网络外推的局限性。
[X26] White, M., & Green, A. (2023). The Causal Forest Book. — Causal Forest 的现代教材。
[X27] Ding, P., & Li, F. (2018). Causal Inference: A Missing Data Perspective. Statistical Science, 33(2), 214–237 — 因果推断作为 missing data 问题的视角。
[X28] Mohan, K., & Pearl, J. (2021). Graphical Models for Processing Missing Data. JASA, 116(534), 1023–1037 — 缺失数据与因果推断的桥接。
[X29] Robins, J. M. (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Comm. Stat. Theory Methods, 23(8), 2379–2412 — DR 思想在工具变量下的早期发展。
[X30] van der Laan, M. J., & Hejazi, N. (2019). TMLE for Causal Inference: A Practical Guide. — TMLE 实践指南。
[X31] Díaz, I. (2020). Statistical Methods for Causal Inference in Observational Studies: From Simple to Advanced. Springer — DML/TMLE 比较教材。
[X32] Kohavi, R., et al. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press — Twyman's law 与 A/B 测试的"陷阱"。
[X33] Mothilal, R. K., Sharma, A., & Tan, C. (2020). Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. FAccT 2020 — DiCE 多样性反事实解释。
[X34] Facure, M. (2020). Ch 12. Doubly Robust Estimators. https://bit.ly/DoublyRobust — DR 详细推导。
[X35] Radcliffe, N. (2007). Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models. Direct Marketing Analytics Journal — qini coefficient 原始来源。
[X36] Kohavi, R., et al. (2020). Twyman's Law. https://en.wikipedia.org/wiki/Twyman%27s_law — Twyman's law 维基词条。