MC MetacognitiveEjentumCorrectness Flip

MC-V2-22

+21.9pp

0.714 baseline → 0.933 reasoning-multi

The Scores

Opus 4.6 Thinking · Baseline

0.714

Correctness1/3

Reasoning Depth3/3

Self-Monitoring2/3

Verification2/3

Epistemic Honesty1/3

Alternatives3/3

Audit Trail3/3

Opus 4.6 Thinking · reasoning

0.867

Correctness1/3

Reasoning Depth3/3

Self-Monitoring3/3

Verification3/3

Epistemic Honesty2/3

Alternatives3/3

Audit Trail3/3

Opus 4.6 Thinking · reasoning-multi

0.933

Correctness3/3

Reasoning Depth3/3

Self-Monitoring3/3

Verification3/3

Epistemic Honesty3/3

Alternatives3/3

Audit Trail3/3

The Task

A tech company’s diversity report states: ‘We hired 1,000 engineers this year. Our hiring pipeline was: 10,000 applications → 3,000 phone screens → 1,000 on-sites → 1,000 offers → 1,000 hires. Women represented 25% of applications, 24% of phone screens, 23% of on-sites, 22% of offers, and 22% of hires. Since the gender ratio remained nearly constant at each stage (within 3 percentage points of the application pool), our hiring process shows no significant gender bias.’ Does the near-constant ratio across stages prove the absence of gender bias in the hiring process? (A) Yes, consistent ratios at every stage prove the process treats genders equally (B) Yes, a 3-percentage-point variation is within normal statistical fluctuation (C) No, because small percentage drops at each stage compound to a meaningful overall difference (D) No, because the initial application pool of 25% women may itself reflect bias in the company’s recruiting practices

Scroll to read full task

The Outputs

Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.

Opus 4.6 Thinking · Baseline

# The answer is **(C)** The company's argument contains a classic **compounding erosion** fallacy. While each individual stage shows only a ~1 percentage point drop in women's representation, these small drops multiply across stages: **Stage-by-stage conversion rates tell the real story:** | Stage | Women's rate | Men's rate | |---|---|---| | App → Screen | 720/2,500 = **28.8%** | 2,280/7,500 = **30.4%** | | Screen → On-site | 230/720 = **31.9%** | 770/2,280 = **33.8%** | | On-site → Offer | 220/230 = **95.7%** | 780/770 = **101.3%**¹ | At *every* filtering stage, men convert at a higher rate. The cumulative effect: women went from **25% → 22%**, a 12% relative decline in representation. Framing this as "within 3 percentage points" obscures that compounding. **Why the other options fall short:** - **(A)** and **(B)** accept the company's flawed framing at face value. - **(D)** raises a legitimate concern (the 25% application pool may itself reflect recruiting bias), but it addresses a *different* question — upstream pipeline bias, not the hiring process the company is defending. The question specifically asks about the claim that the *process* shows no bias. The core flaw is presenting a monotonically declining percentage as evidence of fairness, when the consistent direction of decline is itself the signal of bias. ¹ *The >100% rate exposes rounding in the report's numbers, but the directional pattern holds.*

Scroll to read full output

Opus 4.6 Thinking · reasoning

Let me apply the debiasing framework before answering. **Bias scan of the company's argument:** 1. **Anchoring bias** — The company anchors on "3 percentage points" as small, framing each individual stage drop as negligible. This prevents noticing the cumulative effect. 2. **Confirmation bias** — By choosing to compare each stage only to the application pool (not tracking the compounding), they select the metric that confirms "no bias." 3. **Scope insensitivity** — Treating a drop from 25% → 22% as "nearly constant" ignores what it means in absolute terms. **Now, the math:** Let me trace the actual numbers: | Stage | Total | Women (%) | Women (count) | |---|---|---|---| | Applications | 10,000 | 25% | 2,500 | | Phone screens | 3,000 | 24% | 720 | | On-sites | 1,000 | 23% | 230 | | Offers | 1,000 | 22% | 220 | | Hires | 1,000 | 22% | 220 | At each stage, women are slightly less likely to advance than men: - Applications → Phone screens: Women 720/2500 = **28.8%** pass rate vs. Men 2280/7500 = **30.4%** - Phone screens → On-sites: Women 230/720 = **31.9%** vs. Men 770/2280 = **33.8%** - On-sites → Offers: Women 220/230 = **95.7%** vs. Men 780/770 = **101.3%** (rounding) Each stage has a small disadvantage for women, but these **compound multiplicatively**. A woman who applied had a 220/2500 = **8.8%** chance of being hired, while a man had 780/7500 = **10.4%** chance. Men were about **18% more likely** to be hired than women — that's not negligible. The company's framing ("within 3 percentage points at each stage") exploits anchoring bias to make a compounding effect look flat. **The answer is (C): No, because small percentage drops at each stage compound to a meaningful overall difference.** Why not (D)? While (D) raises a valid concern (the 25% application rate may itself reflect upstream bias), it doesn't address the specific claim being made. The company claims their *hiring process* shows no bias — option (C) directly refutes that claim by showing the process itself compounds small disadvantages into a significant one. Option (D) points to a real but separate problem (recruiting/pipeline composition), not the hiring funnel's internal fairness.

Scroll to read full output

Opus 4.6 Thinking · reasoning-multi

**Applying reasoning injection:** **The answer is (D)** — but (C) is also valid and worth understanding. Let me walk through why the argument fails at multiple levels. **Step 1 — Bias audit on the company's argument** The argument structure is: *ratio stayed roughly constant → therefore no bias*. This contains two candidate biases to intercept: 1. **Halo effect**: A "consistent" metric feels fair, so we accept it uncritically. 2. **Anchoring to the application pool**: The company implicitly treats 25% women in applications as a neutral baseline — but that assumption is unverified. **Step 2 — Compositional verification (why C is partially right)** Check each inferential bridge: | Stage | Women % | Women count | |---|---|---| | Applications | 25% of 10,000 | 2,500 | | Phone screens | 24% of 3,000 | 720 | | On-sites | 23% of 1,000 | 230 | | Hires | 22% of 1,000 | **220** | If there were zero pipeline bias, you'd expect 25% × 1,000 = **250** women hired. Instead: **220**. That's **30 fewer women** — a **12% shortfall** relative to their application share. Each 1-point drop is small; composed across stages, it's not negligible. So (C) is correct that compounding matters. **Step 3 — Counterfactual construction (why D is the deeper refutation)** Now simulate the counterfactual: *suppose the ratio had been perfectly constant at 25% through every stage, and exactly 250 women were hired.* Would that prove no bias? **No.** The argument would still fail because it never asks: *why did only 25% of applicants identify as women in the first place?* Possible causes of a suppressed application pool: - Hostile or unwelcoming workplace reputation - Job descriptions written in gendered language - Recruiting channels that underrepresent women - Industry-wide pipeline problems the company actively perpetuates Maintaining a potentially-biased baseline is not evidence of neutrality — it's evidence of baseline preservation. A pipeline that filters 25%-in → 25%-out is **laundering** upstream bias, not eliminating it. **Why D is the strongest single answer** - (C) is empirically true given the data, but it relies on the numbers actually dropping. - (D) is a **logical refutation** — it defeats the argument even under the most favorable interpretation of the data (perfect ratio preservation). The premise that the application pool is an unbiased baseline is **unwarranted**, so the conclusion cannot follow even if all the arithmetic were in the company's favor. The core fallacy is treating the *source distribution* as ground truth rather than as a variable that itself requires scrutiny. **Answer: (D)** — with (C) as a valid secondary reason grounded in the actual numbers.

Scroll to read full output

Source: ejbench_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.

Back to Use Cases Start Building