CA CausalEjentum
CA-V2-10
+53.8pp0.429 baseline → 0.967 reasoning
The Scores
Opus 4.6 Thinking · Baseline
0.429
Correctness3/3
Reasoning Depth2/3
Self-Monitoring0/3
Verification0/3
Epistemic Honesty2/3
Alternatives0/3
Audit Trail2/3
Opus 4.6 Thinking · reasoning
0.967
Correctness3/3
Reasoning Depth3/3
Self-Monitoring3/3
Verification3/3
Epistemic Honesty3/3
Alternatives3/3
Audit Trail3/3
Opus 4.6 Thinking · reasoning-multi
0.900
Correctness3/3
Reasoning Depth3/3
Self-Monitoring2/3
Verification3/3
Epistemic Honesty3/3
Alternatives2/3
Audit Trail3/3
The Task
A ride-sharing company finds that drivers who complete a voluntary advanced training course have 40% fewer accidents. They consider making the course mandatory. A data scientist identifies three potential confounds: (1) Safer drivers self-select into training. (2) Trained drivers get assigned to safer routes by the algorithm (it prioritizes certified drivers for highway routes, which have lower accident rates per mile than city routes). (3) Drivers who complete training have been with the company longer on average (training is offered after 6 months), and accident rates naturally decline with experience. If confound (2) is the dominant effect, making training mandatory would have what impact on overall accident rates?
Options:
(A) Accidents would decrease by approximately 40% as all drivers gain safety skills
(B) Accidents would stay the same or increase, because the training itself has no causal effect on driving skill
(C) Accidents would decrease by exactly 40% because mandatory training removes the self-selection confound
(D) Accidents would decrease, but much less than 40%, because the route-assignment benefit disappears when everyone is certified
Scroll to read full task
The Outputs
Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.
Opus 4.6 Thinking · Baseline
**D**
When everyone is certified, the algorithm can no longer preferentially assign certified drivers to safer highway routes — the distinguishing signal disappears. The 40% gap was largely driven by *where* trained drivers were routed, not by the training improving their skill. Making training mandatory eliminates that sorting mechanism, so the route-safety advantage evaporates. Any genuine skill transfer from the course would still produce some reduction, but far less than 40%.
Scroll to read full output
Opus 4.6 Thinking · reasoning
The claim is: "training causes 40% fewer accidents." This is observational — drivers self-select. I need to reason about what happens when we intervene (make training mandatory) specifically under the assumption that **confound (2) is dominant**.
**Confound (2) analysis:** The algorithm assigns certified drivers to safer highway routes. So the 40% gap isn't (primarily) because training makes drivers safer — it's because trained drivers get routed to lower-risk roads.
**What happens if training becomes mandatory?** All drivers become certified. The algorithm can no longer differentiate — everyone qualifies for highway routes. The preferential routing advantage disappears because the system can't assign *everyone* to the safer routes; the same total mix of city and highway routes must still be covered. The distribution of drivers across routes returns to roughly uniform.
Result: the mechanism producing the 40% gap (route selection) is eliminated. Training itself (under this assumption) contributes little to no causal safety improvement. Accident rates stay essentially the same.
**The answer is (D).**
Wait — let me re-check (B) vs (D). If confound (2) is the **dominant** effect (not the sole effect), training may still have some small causal benefit. But the route-assignment mechanism — which was driving most of the observed 40% — vanishes when everyone is certified. So accidents might decrease slightly (from whatever minor real training effect exists), but far less than 40%.
If it were the **sole** effect, (B) would be correct. But "dominant" implies other factors contribute marginally, so some small decrease is plausible.
**The answer is (D): Accidents would decrease, but much less than 40%, because the route-assignment benefit disappears when everyone is certified.**
When all drivers complete training, the algorithm loses its basis for preferentially routing certified drivers to safer highways. The same dangerous city routes still need to be covered, so the observed 40% gap — driven primarily by route selection, not skill — largely evaporates.
Scroll to read full output
Opus 4.6 Thinking · reasoning-multi
**Step 1 — State the causal claim and classify the data**
Claim: Training → 40% fewer accidents. Data source: observational. This means we cannot accept the 40% as the true causal effect without blocking all backdoor paths.
**Step 2 — Model the causal structure under confound (2) as dominant**
The routing mechanism is:
```
Certification → Algorithm assigns highway routes → Fewer accidents/mile
```
This is a **mediated spurious path**, not a skill improvement. The 40% observed difference reflects: *certified drivers get safer routes*, not *certified drivers drive more safely*.
**Step 3 — Simulate the intervention (mandatory training)**
When training is mandatory, all drivers become certified. Now run the counterfactual:
- The algorithm **can no longer distinguish** certified from uncertified drivers — the entire pool is certified
- The preferential highway assignment mechanism **collapses** — there's no longer a certification signal to sort on
- Route distribution reverts to other criteria (geography, shift, demand)
- The routing advantage that was generating the 40% gap **disappears**
**Step 4 — What happens to overall accident rates?**
| Effect | With voluntary training | With mandatory training |
|---|---|---|
| Routing advantage (dominant) | Trained drivers → highways | Mechanism gone — routes mixed |
| True skill effect (non-dominant) | Small | Small, distributed across all drivers |
| Net observed benefit | ~40% (inflated by routing) | Much less than 40% |
Confound (2) is dominant → the observed 40% is mostly a routing artifact. Making training mandatory removes exactly the mechanism that produced it.
**The answer is (D).**
When everyone is certified, the algorithm's preferential route assignment can no longer sort drivers by certification status. The 40% figure was primarily measuring the accident rate difference between highway and city routes — not a training skill effect. That advantage evaporates under mandatory training, leaving only whatever small direct causal effect training actually has. Accidents may decrease slightly (non-dominant effects still operate), but far less than 40%.
The key causal insight: **an intervention that removes the sorting mechanism also removes the benefit that depended on that sorting.** Making the credential universal destroys its value as a routing signal.
Scroll to read full output
Source: ejbench_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.