CA CausalEjentum
CA-V2-18
+54.8pp0.286 baseline → 0.833 Haki
The Scores
Opus 4.6 Thinking · Baseline
0.286
Correctness3/3
Reasoning Depth2/3
Self-Monitoring0/3
Verification0/3
Epistemic Honesty0/3
Alternatives0/3
Audit Trail1/3
Opus 4.6 Thinking · Ki
0.700
Correctness3/3
Reasoning Depth2/3
Self-Monitoring2/3
Verification2/3
Epistemic Honesty1/3
Alternatives1/3
Audit Trail2/3
Opus 4.6 Thinking · Haki
0.833
Correctness3/3
Reasoning Depth3/3
Self-Monitoring2/3
Verification2/3
Epistemic Honesty2/3
Alternatives3/3
Audit Trail3/3
The Task
A social media company finds that users who receive more 'likes' post more frequently. Product managers interpret this as: likes motivate posting (forward causality). They implement a feature to artificially boost likes on new users' posts to increase engagement. After 3 months, the boosted-likes group posts 5% more (far less than the expected 40% increase predicted from the observational correlation). A data scientist explains that the weak effect suggests the dominant causal direction is reverse: prolific posters generate more content, more content gets more total likes, and the likes-per-post rate is actually LOWER for prolific posters. If the reverse-causal model is correct, what should the relationship between posting frequency and likes-PER-POST look like?
Options:
(A) Positive — more posts means more skill, each post gets more likes
(B) Negative — each additional post dilutes quality, reducing likes per post
(C) No relationship — likes per post is random regardless of posting volume
(D) U-shaped — very low and very high posters get more likes per post than moderate posters
Scroll to read full task
The Outputs
Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.
Opus 4.6 Thinking · Baseline
**The answer is (B) Negative.**
Under the reverse-causal model, prolific posters produce more content but each additional post dilutes quality/attention. So as posting frequency increases, likes-per-post should **decrease** — a negative relationship.
This is exactly what the data scientist is pointing out: the observed correlation between total likes and posting frequency is driven by volume, not by each post being more liked. When you normalize by post count, the relationship inverts.
Scroll to read full output
Opus 4.6 Thinking · Ki
Let me work through the causal structure carefully, suppressing linear one-way thinking.
**The reverse-causal model states:**
- Prolific posters produce more content → more total content attracts more total likes (aggregate)
- But the *rate* (likes per post) is **lower** for prolific posters
**Why?** Each additional post from a prolific poster dilutes quality — there's a finite amount of high-quality, engaging content any person can produce. As volume increases, average quality per post drops, and audience attention per post drops. More posts compete for the same follower base's limited attention.
This is a negative feedback loop on per-unit quality: higher volume → lower per-unit engagement, which is precisely why the artificial-likes intervention failed — the real driver was content volume, and likes-per-post actually *decreases* with frequency.
**The answer is (B) — Negative.**
Each additional post dilutes quality and splits audience attention, so likes-per-post decreases as posting frequency increases. This is exactly what the data scientist's reverse-causal explanation predicts: prolific posters accumulate more *total* likes (because they have more posts), but each individual post receives *fewer* likes on average. The observational correlation between total likes and posting frequency was driven by volume, not by per-post engagement — which is why artificially boosting likes had almost no effect on posting behavior.
Scroll to read full output
Opus 4.6 Thinking · Haki
**The answer is (B) — Negative.**
**Reasoning:**
The reverse-causal model, as explicitly stated by the data scientist, works like this:
```
Prolific poster (intrinsic trait)
→ posts more content
→ more content accumulates more *total* likes
→ but each individual post competes for audience attention
→ likes-per-post is LOWER for high-frequency posters
```
The data scientist even states this directly: *"the likes-per-post rate is actually LOWER for prolific posters."* The reverse-causal model predicts posting frequency ↑ → likes/post ↓ — a **negative relationship**.
Why the other options fail under this model:
- **(A)** would require a skill/quality mechanism — but the model is about volume dilution, not skill accumulation
- **(C)** would mean no structural relationship exists — contradicts the model's core mechanism
- **(D)** requires a special non-monotonic structure with no basis in the stated model
The intervention result (only 5% lift vs. 40% predicted) is the empirical fingerprint: artificially inflating likes barely moves posting because it's not likes that drive posting — it's the user's inherent tendency to post that drives both frequency *and* total likes. Each additional post from a prolific poster dilutes per-post engagement, making **(B)** the signature prediction of the reverse-causal model.
Scroll to read full output
Source: ejbench_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.