从"预测"到"证明有效"：因果归因分析在医疗AI中的核心作用

大多数医疗AI只做预测。PSM倾向性得分匹配法如何帮助ReHealth AI生产支付方认可的因果证据，让预防结果第一次可以被结算？

医疗AI领域有一个被忽视已久的根本性缺陷：几乎所有的医疗AI产品，只做预测，不做归因。

预测风险很有用——告诉医生这个患者未来三年心脏病发作的概率是23%，这是有价值的信息。但预测本身无法回答一个更关键的问题："如果我们进行了干预，风险真的降低了吗？"

这个问题的答案，决定了预防医疗能否被结算。

PSM：构造"平行世界"的统计方法

PSM（Propensity Score Matching，倾向性得分匹配法）是解决这个问题的核心方法。它的思路是：如果我们无法做随机对照试验（RCT），那么我们可以在观察数据中，用统计方法构造一个尽可能接近随机对照的比较组。

核心思路

对于每一个"接受了干预"的患者，在"未接受干预"的患者中找到一个在所有重要特征上都与其高度相似的"匹配对"。然后比较这两个高度相似的人，一个接受干预、一个没有，结果有何差异——这个差异才是干预本身的真实效果。

PSM 的具体操作步骤

收集混杂变量

识别所有可能同时影响"是否接受干预"和"结果"的变量：年龄、性别、基础疾病、用药史、生活习惯、社会经济状况等。

计算倾向性得分

用逻辑回归或其他分类模型，预测每个患者"接受干预的概率"——这个概率就是倾向性得分。得分相近的患者，在特征分布上高度相似。

按得分匹配

对每个干预组患者，从对照组中找到倾向性得分最接近的患者进行配对。配对后，两组患者在所有已知混杂变量上的分布高度相似。

比较匹配后的结果

在配对后的样本中，比较干预组和对照组的结果差异。由于两组特征高度相似，这个差异可以归因于干预本身。

PSM 匹配示意 / PSM Matching Process

干预组患者 A
60岁男高血压吸烟

↔ 匹配

对照组患者 B
61岁男高血压吸烟

→

风险差异
= 干预效果

特征高度相似的一对患者，唯一差异是是否接受干预

# PSM 倾向性得分计算（简化示意） from sklearn.linear_model import LogisticRegression # 混杂变量：年龄、BMI、血压、病史等 covariates = ['age', 'bmi', 'systolic_bp', 'diabetes', 'smoking', 'family_history'] # 计算接受干预的概率（倾向性得分） model = LogisticRegression() model.fit(X[covariates], treatment_indicator) propensity_scores = model.predict_proba(X[covariates])[:, 1] # 按得分匹配，构造近似随机对照组 matched_pairs = nearest_neighbor_matching(propensity_scores, caliper=0.1) # 在匹配样本中估计平均干预效果 ATT = estimate_average_treatment_effect(matched_pairs) # ATT = Average Treatment Effect on the Treated # 这才是干预本身的真实效果

为什么 PSM 是种子阶段的核心方法

我们选择 PSM 作为种子阶段的核心因果分析方法，原因很务实：PSM 对数据量的要求相对较低，适合早期阶段；PSM 的方法论已被医学界和监管机构广泛接受，产生的证据有较高的可信度；PSM 的结果相对容易解释，方便向保险公司和医疗机构说明。

因果分析方法路线图 / Causal Analysis Roadmap

种子阶段 PSM 倾向性得分匹配——数据需求低，方法成熟，可快速产生临床可接受证据

A 轮引入 DID（差分法）——控制时间维度的混杂，适合长期随访数据

B 轮+ 合成对照法 + 工具变量——处理更复杂的因果场景，构建更高证据等级

因果证据如何转化为结算依据

当我们用PSM证明了"接受ReHealth AI干预方案的患者，三年内心脑血管发病率比同等条件未干预患者低X%"，这个数字就变成了一份有统计显著性支撑的因果证据报告。

这份报告可以用于：保险公司评估是否将预防干预项目纳入理赔范围；医疗机构向政府医保部门申请预防项目报销资格；企业向员工证明健康管理投入的实际效益。

预防第一次有了可以结算的依据。

局限性与诚实的边界

PSM 不是万能的。它只能控制"已知的混杂变量"，如果存在我们没有测量到的重要混杂因素，PSM 也无法完全消除偏差。这就是为什么我们需要持续扩大数据维度，收集更全面的个体健康信息——可穿戴设备数据、HIS系统数据、问卷数据——来尽可能减少未知混杂的影响。

这也是为什么我们说这是一个需要时间积累的基础设施建设，而不是一个可以快速"刷指标"的AI产品。

There's a fundamental flaw in healthcare AI that's been ignored for too long: almost all healthcare AI products only predict — they don't attribute causality.

Risk prediction is useful — telling a physician that this patient has a 23% probability of heart attack in the next three years is valuable information. But prediction alone can't answer a more critical question: "If we intervene, does the risk actually decrease?"

The answer to this question determines whether preventive medicine can be billed.

Correlation ≠ Causation: Healthcare AI's Core Dilemma

A classic example: we observe that patients taking a certain antihypertensive have 30% lower heart attack rates than those who don't. Does this mean the drug works?

Not necessarily. Patients who take medication may inherently be more health-conscious, more likely to get regular checkups, maintain healthier lifestyles — factors that independently reduce cardiac risk. Drug effects and "patients being healthier" are confounded together and can't be separated.

❌ Correlation Analysis (Insufficient)

Observed: Treatment group has lower incidence

Can't rule out: Treatment group was healthier to begin with. Can't conclude: The drug itself reduced risk.

✓ Causal Attribution (What We Need)

Proves: The intervention itself changed outcomes

Control confounders, construct counterfactual comparison groups, statistically verify the intervention's independent effect. This is payer-accepted evidence.

PSM: Constructing "Parallel Worlds" Statistically

PSM (Propensity Score Matching) is the core method for solving this problem. Its insight: if we can't run randomized controlled trials (RCTs), we can use statistical methods on observational data to construct a comparison group that approximates random assignment.

Core Insight

For each patient who "received intervention," find a patient from the "no intervention" group who is highly similar on all important characteristics. Compare this closely matched pair — one received intervention, one didn't — and the outcome difference is the true causal effect of the intervention.

PSM Matching Process

Treatment Patient A
60yr M Hypertension Smoker

↔ Match

Control Patient B
61yr M Hypertension Smoker

→

Risk Difference
= Intervention Effect

Highly similar pair — the only difference is whether they received intervention

Why PSM Is Our Core Method at Seed Stage

Causal Analysis Roadmap

Seed PSM — low data requirements, established methodology, rapidly generates clinically accepted evidence

Series A DID (Difference-in-Differences) — controls time-dimension confounders, suited for long-term follow-up data

Series B+ Synthetic control + instrumental variables — handles more complex causal scenarios, builds higher evidence grades

How Causal Evidence Becomes Settlement Basis

When PSM proves that "patients receiving ReHealth AI intervention programs had X% lower cardiovascular incidence over three years compared to similarly characterized non-intervention patients," this number becomes a causal evidence report backed by statistical significance.

This report enables: insurers to evaluate incorporating preventive intervention programs into coverage; hospitals to apply for preventive program reimbursement qualification; enterprises to demonstrate the actual ROI of health management investment to employees.

Prevention has a billing basis for the first time.

从"预测"到"证明有效"：
因果归因分析在医疗AI中的核心作用

相关性 ≠ 因果性：医疗AI的根本困境

观察到：服药组发病率更低

证明：干预本身改变了结果

PSM：构造"平行世界"的统计方法

PSM 的具体操作步骤

收集混杂变量

计算倾向性得分

按得分匹配

比较匹配后的结果

为什么 PSM 是种子阶段的核心方法

因果证据如何转化为结算依据

局限性与诚实的边界

ReHealth Core：从预测到因果证据的完整系统

From "Prediction" to "Proof":
Causal Attribution's Core Role in Healthcare AI

Correlation ≠ Causation: Healthcare AI's Core Dilemma

Observed: Treatment group has lower incidence

Proves: The intervention itself changed outcomes

PSM: Constructing "Parallel Worlds" Statistically

Why PSM Is Our Core Method at Seed Stage

How Causal Evidence Becomes Settlement Basis

ReHealth Core: Complete System from Prediction to Causal Evidence

从"预测"到"证明有效"：因果归因分析在医疗AI中的核心作用

相关性 ≠ 因果性：医疗AI的根本困境

观察到：服药组发病率更低

证明：干预本身改变了结果

PSM：构造"平行世界"的统计方法

PSM 的具体操作步骤

收集混杂变量

计算倾向性得分

按得分匹配

比较匹配后的结果

为什么 PSM 是种子阶段的核心方法

因果证据如何转化为结算依据

局限性与诚实的边界

ReHealth Core：从预测到因果证据的完整系统

From "Prediction" to "Proof":Causal Attribution's Core Role in Healthcare AI

Correlation ≠ Causation: Healthcare AI's Core Dilemma

Observed: Treatment group has lower incidence

Proves: The intervention itself changed outcomes

PSM: Constructing "Parallel Worlds" Statistically

Why PSM Is Our Core Method at Seed Stage

How Causal Evidence Becomes Settlement Basis

ReHealth Core: Complete System from Prediction to Causal Evidence

从"预测"到"证明有效"：
因果归因分析在医疗AI中的核心作用

From "Prediction" to "Proof":
Causal Attribution's Core Role in Healthcare AI