CA CausalCausalBenchCorrectness Flip

EXT-CB-26

+23.8pp

0.619 baseline → 0.857 reasoning

The Scores

Opus 4.6 Thinking · Baseline

0.619

Correctness1/3

Reasoning Depth3/3

Self-Monitoring1/3

Verification2/3

Epistemic Honesty1/3

Alternatives2/3

Audit Trail3/3

Opus 4.6 Thinking · reasoning

0.857

Correctness3/3

Reasoning Depth3/3

Self-Monitoring2/3

Verification3/3

Epistemic Honesty2/3

Alternatives2/3

Audit Trail3/3

Opus 4.6 Thinking · reasoning-multi

0.762

Correctness1/3

Reasoning Depth3/3

Self-Monitoring3/3

Verification2/3

Epistemic Honesty2/3

Alternatives2/3

Audit Trail3/3

The Task

Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Ability to speak English has a direct effect on diabetes and lifespan. Smoking has a direct effect on diabetes and lifespan. Diabetes has a direct effect on lifespan. Smoking is unobserved. For people who do not speak English and without diabetes, the probability of a long lifespan is 71%. For people who do not speak English and with diabetes, the probability of a long lifespan is 46%. For people who speak English and without diabetes, the probability of a long lifespan is 40%. For people who speak English and with diabetes, the probability of a long lifespan is 22%. Assuming an intervention that guarantees a long lifespan, is it likely that the person does not speak English?

Scroll to read full task

The Outputs

Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.

Opus 4.6 Thinking · Baseline

## Causal Analysis: Intervention on Lifespan ### The Causal Structure (DAG) ``` English ──→ Diabetes ──→ Lifespan └──────────────────→ ↑ Smoking ──→ Diabetes | └──────────────────→──┘ ``` ### Key Principle: do-calculus When we **intervene** on a variable — written `do(Lifespan = long)` — we **sever all incoming causal edges** to that variable. The lifespan is now set externally, not caused by English speaking, diabetes, or smoking. This means: ``` English ──→ Diabetes Lifespan = LONG (fixed by intervention) └──────────────────↛ ↑ ← these arrows are CUT Smoking ──→ Diabetes ↛──┘ ``` ### The Implication Under `do(Lifespan = long)`: - **Everyone** has a long lifespan — it's guaranteed by the intervention. - Observing a long lifespan under this condition carries **no information** about its former causes (English speaking, diabetes, smoking), because those causal paths no longer exist. - Therefore: `P(not English | do(long lifespan)) = P(not English)` — just the baseline prior. ### Contrast with Observational Case If instead we *observed* a long lifespan (no intervention), the data **would** favor non-English speakers: | Group | P(long lifespan) | |---|---| | Not English | 0.71×0.5 + 0.46×0.5 = **58.5%** | | English | 0.40×0.5 + 0.22×0.5 = **31.0%** | → P(not English | observed long lifespan) ≈ **65%** — yes, more likely. ### Answer **No.** Under the intervention `do(Lifespan = long)`, it is **not more likely** that the person does not speak English. The intervention breaks the causal chain between English speaking and lifespan, so the long lifespan observation becomes uninformative about English-speaking status. The probability reverts to the unconditional baseline — not elevated by the outcome. This is the core distinction between **seeing** (conditioning on evidence) and **doing** (intervening to set a value).

Scroll to read full output

Opus 4.6 Thinking · reasoning

## Causal Analysis **DAG structure:** - English (E) → Diabetes (D), Lifespan (L) - Smoking (S, unobserved) → D, L - D → L **Key check — backdoor paths into E:** Nothing points *into* E in this DAG, so there are no backdoor paths from E. This means P(L | do(E=e)) = P(L | E=e) — the interventional and observational distributions for English are identical. **Question:** P(E=0 | L=1) — given long lifespan, is the person likely a non-English speaker? **Stochastic dominance check** (works regardless of priors on D): | Condition | P(L=1 \| E=0) | P(L=1 \| E=1) | |-----------|--------------|--------------| | D=0 | **0.71** | 0.40 | | D=1 | **0.46** | 0.22 | Non-English speakers have strictly higher lifespan probability in **both** diabetes states. By stochastic dominance, conditional on L=1, a person is more likely to be a non-English speaker. **With uniform priors** (P(E=0)=0.5, P(D|E)=0.5 each): - P(L=1 | E=0) = 0.5×0.71 + 0.5×0.46 = **0.585** - P(L=1 | E=1) = 0.5×0.40 + 0.5×0.22 = **0.310** - P(L=1) = 0.5×0.585 + 0.5×0.310 = 0.4475 **P(E=0 | L=1) = 0.585×0.5 / 0.4475 ≈ 0.654** **Answer: Yes.** It is likely (~65%) that the person does not speak English. Non-English speakers dominate in lifespan probability across all diabetes states, so inverting via Bayes' theorem yields a majority posterior for E=0 given long lifespan.

Scroll to read full output

Opus 4.6 Thinking · reasoning-multi

## Causal Structure The DAG described: ``` English (E) ──→ Diabetes (D) ──→ Lifespan (L) │ ↑ └────────────────────────────────┘ Smoking (S, unobserved) ──→ D, L ``` Arrows **into** L: from E, S, and D. No arrow **from** L to E. ## Given Probabilities | English | Diabetes | P(Long Lifespan) | |---------|----------|-------------------| | No | No | 71% | | No | Yes | 46% | | Yes | No | 40% | | Yes | Yes | 22% | ## Applying the Intervention: do(L = long) The question asks about **P(E=0 | do(L=long))** — the probability of not speaking English when we *intervene* to guarantee a long lifespan. **Amplifying: interventional distribution simulation / backdoor path blocking** Under do(L=long), we sever **all causal arrows into L** (from E, S, and D). Lifespan is now fixed by external intervention, not determined by its causes. The key insight: there is **no causal path from L to E** in this model. English ability is a root cause (exogenous), not an effect of lifespan. Therefore: $P(E=0 \mid do(L=\text{long})) = P(E=0)$ The intervention on L leaves the distribution of E completely unchanged. ## Answer **No.** Under an intervention that guarantees a long lifespan, we **cannot conclude** it is likely the person does not speak English. The intervention severs all incoming arrows to L, making E and L independent. Since lifespan does not cause English-speaking ability, forcing a long lifespan provides **zero information** about whether the person speaks English. The probability of not speaking English remains at its prior rate P(E=0), which is not determinable from the given data. The temptation here (which the Suppress signals block) is to reason from the *observational* association — non-English speakers do have higher lifespan probabilities across all diabetes states — and conclude "yes." But that reverses the causal arrow. Intervention ≠ observation.

Scroll to read full output

Source: bbh_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.

Back to Use Cases Start Building