CA CausalCausalBenchCorrectness Flip
EXT-CB-26
+23.8pp0.619 baseline → 0.857 reasoning
The Scores
Opus 4.6 Thinking · Baseline
0.619
Correctness1/3
Reasoning Depth3/3
Self-Monitoring1/3
Verification2/3
Epistemic Honesty1/3
Alternatives2/3
Audit Trail3/3
Opus 4.6 Thinking · reasoning
0.857
Correctness3/3
Reasoning Depth3/3
Self-Monitoring2/3
Verification3/3
Epistemic Honesty2/3
Alternatives2/3
Audit Trail3/3
Opus 4.6 Thinking · reasoning-multi
0.762
Correctness1/3
Reasoning Depth3/3
Self-Monitoring3/3
Verification2/3
Epistemic Honesty2/3
Alternatives2/3
Audit Trail3/3
The Task
Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Ability to speak English has a direct effect on diabetes and lifespan. Smoking has a direct effect on diabetes and lifespan. Diabetes has a direct effect on lifespan. Smoking is unobserved. For people who do not speak English and without diabetes, the probability of a long lifespan is 71%. For people who do not speak English and with diabetes, the probability of a long lifespan is 46%. For people who speak English and without diabetes, the probability of a long lifespan is 40%. For people who speak English and with diabetes, the probability of a long lifespan is 22%. Assuming an intervention that guarantees a long lifespan, is it likely that the person does not speak English?
Scroll to read full task
The Outputs
Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.
Opus 4.6 Thinking · Baseline
## Causal Analysis: Intervention on Lifespan
### The Causal Structure (DAG)
```
English ──→ Diabetes ──→ Lifespan
└──────────────────→ ↑
Smoking ──→ Diabetes |
└──────────────────→──┘
```
### Key Principle: do-calculus
When we **intervene** on a variable — written `do(Lifespan = long)` — we **sever all incoming causal edges** to that variable. The lifespan is now set externally, not caused by English speaking, diabetes, or smoking.
This means:
```
English ──→ Diabetes Lifespan = LONG (fixed by intervention)
└──────────────────↛ ↑ ← these arrows are CUT
Smoking ──→ Diabetes ↛──┘
```
### The Implication
Under `do(Lifespan = long)`:
- **Everyone** has a long lifespan — it's guaranteed by the intervention.
- Observing a long lifespan under this condition carries **no information** about its former causes (English speaking, diabetes, smoking), because those causal paths no longer exist.
- Therefore: `P(not English | do(long lifespan)) = P(not English)` — just the baseline prior.
### Contrast with Observational Case
If instead we *observed* a long lifespan (no intervention), the data **would** favor non-English speakers:
| Group | P(long lifespan) |
|---|---|
| Not English | 0.71×0.5 + 0.46×0.5 = **58.5%** |
| English | 0.40×0.5 + 0.22×0.5 = **31.0%** |
→ P(not English | observed long lifespan) ≈ **65%** — yes, more likely.
### Answer
**No.** Under the intervention `do(Lifespan = long)`, it is **not more likely** that the person does not speak English. The intervention breaks the causal chain between English speaking and lifespan, so the long lifespan observation becomes uninformative about English-speaking status. The probability reverts to the unconditional baseline — not elevated by the outcome.
This is the core distinction between **seeing** (conditioning on evidence) and **doing** (intervening to set a value).
Scroll to read full output
Opus 4.6 Thinking · reasoning
## Causal Analysis
**DAG structure:**
- English (E) → Diabetes (D), Lifespan (L)
- Smoking (S, unobserved) → D, L
- D → L
**Key check — backdoor paths into E:**
Nothing points *into* E in this DAG, so there are no backdoor paths from E. This means P(L | do(E=e)) = P(L | E=e) — the interventional and observational distributions for English are identical.
**Question:** P(E=0 | L=1) — given long lifespan, is the person likely a non-English speaker?
**Stochastic dominance check** (works regardless of priors on D):
| Condition | P(L=1 \| E=0) | P(L=1 \| E=1) |
|-----------|--------------|--------------|
| D=0 | **0.71** | 0.40 |
| D=1 | **0.46** | 0.22 |
Non-English speakers have strictly higher lifespan probability in **both** diabetes states. By stochastic dominance, conditional on L=1, a person is more likely to be a non-English speaker.
**With uniform priors** (P(E=0)=0.5, P(D|E)=0.5 each):
- P(L=1 | E=0) = 0.5×0.71 + 0.5×0.46 = **0.585**
- P(L=1 | E=1) = 0.5×0.40 + 0.5×0.22 = **0.310**
- P(L=1) = 0.5×0.585 + 0.5×0.310 = 0.4475
**P(E=0 | L=1) = 0.585×0.5 / 0.4475 ≈ 0.654**
**Answer: Yes.** It is likely (~65%) that the person does not speak English. Non-English speakers dominate in lifespan probability across all diabetes states, so inverting via Bayes' theorem yields a majority posterior for E=0 given long lifespan.
Scroll to read full output
Opus 4.6 Thinking · reasoning-multi
## Causal Structure
The DAG described:
```
English (E) ──→ Diabetes (D) ──→ Lifespan (L)
│ ↑
└────────────────────────────────┘
Smoking (S, unobserved) ──→ D, L
```
Arrows **into** L: from E, S, and D. No arrow **from** L to E.
## Given Probabilities
| English | Diabetes | P(Long Lifespan) |
|---------|----------|-------------------|
| No | No | 71% |
| No | Yes | 46% |
| Yes | No | 40% |
| Yes | Yes | 22% |
## Applying the Intervention: do(L = long)
The question asks about **P(E=0 | do(L=long))** — the probability of not speaking English when we *intervene* to guarantee a long lifespan.
**Amplifying: interventional distribution simulation / backdoor path blocking**
Under do(L=long), we sever **all causal arrows into L** (from E, S, and D). Lifespan is now fixed by external intervention, not determined by its causes.
The key insight: there is **no causal path from L to E** in this model. English ability is a root cause (exogenous), not an effect of lifespan. Therefore:
$P(E=0 \mid do(L=\text{long})) = P(E=0)$
The intervention on L leaves the distribution of E completely unchanged.
## Answer
**No.** Under an intervention that guarantees a long lifespan, we **cannot conclude** it is likely the person does not speak English. The intervention severs all incoming arrows to L, making E and L independent. Since lifespan does not cause English-speaking ability, forcing a long lifespan provides **zero information** about whether the person speaks English. The probability of not speaking English remains at its prior rate P(E=0), which is not determinable from the given data.
The temptation here (which the Suppress signals block) is to reason from the *observational* association — non-English speakers do have higher lifespan probabilities across all diabetes states — and conclude "yes." But that reverses the causal arrow. Intervention ≠ observation.
Scroll to read full output
Source: bbh_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.