CA CausalCausalBenchCorrectness Flip

EXT-CB-15

+33.3pp
0.476 baseline → 0.809 reasoning-multi

The Scores

Opus 4.6 Thinking · Baseline

0.476
Correctness0/3
Reasoning Depth2/3
Self-Monitoring1/3
Verification1/3
Epistemic Honesty2/3
Alternatives2/3
Audit Trail2/3

Opus 4.6 Thinking · reasoning

0.809
Correctness1/3
Reasoning Depth3/3
Self-Monitoring2/3
Verification2/3
Epistemic Honesty3/3
Alternatives3/3
Audit Trail3/3

Opus 4.6 Thinking · reasoning-multi

0.809
Correctness3/3
Reasoning Depth3/3
Self-Monitoring2/3
Verification2/3
Epistemic Honesty2/3
Alternatives2/3
Audit Trail3/3

The Task

Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Tanning salon treatment has a direct effect on skin. Going to the beach has a direct effect on skin. Method 1: We look directly at how tanning salon treatment correlates with skin in general. Method 2: We look at this correlation case by case according to going to the beach. If skin health varies significantly, is it likely due to tanning salon treatment?

Scroll to read full task

The Outputs

Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.

Opus 4.6 Thinking · Baseline

## Causal Analysis of Skin Variation ### The Structure of This World ``` Tanning Salon Treatment ──┐ ├──► Skin Going to the Beach ───────┘ ``` Both variables are **independent causes** with direct effects on skin. No other relationships exist. ### Evaluating the Two Methods **Method 1 (Overall correlation):** Observing a correlation between tanning salon treatment and skin variation is **confounded** by beach exposure. People who use tanning salons may also go to the beach more (or less) — and since beach independently affects skin, the raw correlation mixes the two signals together. You cannot cleanly attribute observed variation to tanning salon treatment alone. **Method 2 (Stratified by beach):** By examining the tanning salon ↔ skin relationship *separately* for beach-goers and non-beach-goers, you **isolate** the tanning salon effect from the beach effect. This is the more rigorous approach. ### Answer to the Question **No — not necessarily.** If skin health varies significantly, it is *not safe to conclude* the variation is likely due to tanning salon treatment, because: 1. **Beach exposure is an equally valid cause.** In this world, both variables have direct, independent effects on skin. 2. **Method 1 is insufficient** — it cannot distinguish between the two causes. 3. **Only Method 2** controls for beach exposure and allows a cleaner inference about tanning salon treatment's specific contribution. > The variation in skin could be driven by beach exposure, tanning salon treatment, or both. Attributing it to tanning salon treatment without stratifying by beach (Method 2) would be an error in causal reasoning. **Method 2 is the correct approach** for making any reliable causal claim about tanning salon treatment.

Scroll to read full output

Opus 4.6 Thinking · reasoning

**** ## Analysis ### Causal Structure The world described has this DAG: ``` Tanning Salon (T) ──→ Skin (S) Beach (B) ──────────→ Skin (S) T ⊥ B (independent — no causal link between them) ``` Two independent causes, one shared outcome. No common cause between T and B. ### Evaluating the Two Methods **Method 1 — Raw correlation of T with S (overall)** Since T and B are causally independent (no backdoor path from T to S through B), there is no confounding. Method 1 gives a valid estimate of T's causal effect on S. **Method 2 — Stratify by B, then correlate T with S** Because B is not a cause of T and there is no common cause of T and B, conditioning on B does not open any spurious path. Method 2 is redundant here — it doesn't add or remove bias. It splits the sample unnecessarily. ### Is significant skin variation *likely due to* tanning salon treatment? **No — not without isolation.** Even though T has a direct causal effect on S, B *also* has a direct causal effect on S. Observed variation in skin health could come from: - Tanning salon treatment alone - Beach-going alone - Both simultaneously Attributing the variation to tanning salon treatment based on correlation alone (the ### Verdict | | Method 1 | Method 2 | |---|---|---| | Validity | Valid (no confounding) | Also valid, but unnecessary | | Preferred? | Yes — simpler, sufficient | No — adds no value here | Skin variation **may** be partly due to tanning salon treatment, but the variation cannot be *attributed* to it without confirming that beach-going's contribution has been accounted for. Method 1 is the correct approach in this structure — but even using it, the answer is that skin variation reflects both causes, not tanning salon treatment alone.

Scroll to read full output

Opus 4.6 Thinking · reasoning-multi

## Reasoning with RA²R Injection **** ## Analysis **Step 1 — State the causal structure.** The world has exactly two independent causes and one effect: ``` Tanning Salon ──→ Skin ↑ Going to Beach ────┘ ``` There is **no arrow** between Tanning and Beach. They are independent causes. No confounders, no mediators, no common causes — the problem explicitly rules them out. **Step 2 — Check for backdoor paths** (amplifying: backdoor path blocking). For Tanning → Skin, the only potential confounder would be Beach if there were a path Tanning ← ??? → Beach. There is none. **All backdoor paths are trivially blocked** because no path connects Tanning and Beach except through Skin (a collider). **Step 3 — Evaluate the two methods.** - **Method 1** (raw correlation): Since Tanning and Beach are independent, the raw correlation between Tanning and Skin already isolates tanning's causal effect. No confounding biases the estimate. - **Method 2** (stratify by Beach): Also valid — conditioning on Beach won't introduce collider bias here because we're stratifying on a co-cause, not a descendant. Both methods should yield **the same result**. **Step 4 — Answer the question** (suppressing: correlation-to-causation leap). Normally I'd flag that correlation ≠ causation. But in this closed world with known structure and no confounders: > **Yes — if skin health varies significantly with tanning salon treatment, it is likely due to tanning salon treatment.** Both methods converge on the same conclusion because there is no confounding between the two causes. The variation in skin attributable to tanning is real and causal, not spurious. Beach-going is an independent contributor — it adds variance to skin outcomes but does not confound the tanning–skin relationship.

Scroll to read full output

Source: bbh_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.