From 6 Domains to 12: Where Reasoning Breaks Next

The current six reasoning domains cover how agents fail when processing causes, sequences, structures, hypotheticals, categories, and their own outputs. They don't cover how agents fail when modeling other minds, reasoning under physical constraints, maintaining coherence across long discourse, detecting unstated intent, transferring knowledge across domains, or forecasting the future.

These aren't edge cases. They're entire failure classes that production agents encounter daily.

What the Current 6 Cover

Domain	The question it answers	The failure it blocks
Causal	Why did X happen?	Treating correlation as causation
Temporal	When? In what order?	Confabulating timelines, optimism bias
Spatial	How do parts connect?	Boundary violations, broken topology
Simulation	What if we change X?	Ignoring second-order consequences
Abstraction	What do these have in common?	Category errors, metaphor as mechanism
Metacognition	Is my reasoning consistent?	Hallucination spirals, reasoning drift

311 abilities across these six. They handle analytical and technical reasoning well. Our benchmarks show +20.8pp composite lift on external tasks and +10.1pp on complex professional tasks.

But agents don't only do analytical reasoning.

Where They Fail Next

1. Analogical Reasoning

Your agent fails when: it needs to transfer knowledge from one domain to another. It matches by surface similarity ("both are networks, both involve flow") instead of mapping the relational structure that actually transfers. The analogy sounds plausible but carries no inferential weight.

What it blocks: Surface analogy traps. Analogy over-extension. Accepting shallow correspondences without verifying deeper structural alignment.

Example: An agent asked to apply lessons from the Apollo program to a software migration. Without analogical reasoning, it matches surface features (both are "complex projects"). With it, the agent maps relational structure (phased testing, rollback protocols, staged independence from legacy systems) and identifies where the mapping breaks (Apollo had no rollback; software does).

2. Theory of Mind

Your agent fails when: it needs to model what other people believe, know, or want, especially when those states differ from ground truth. Asked "what does the client think about the delay?", the agent answers with what IT knows, not what the client's information environment would support.

What it blocks: Egocentric projection. Treating a team as one mind with one belief state. Confusing "they're frustrated" with "they believe the system is broken."

Example: An agent analyzing stakeholder responses to a product change. Without theory of mind, it models all stakeholders as having the same information. With it, the agent tracks separate belief states (the engineering team knows about the technical debt, the sales team doesn't, the VP has heard a filtered version) and predicts each stakeholder's response from their own evidence, not the agent's omniscient view.

3. Physical Reasoning

Your agent fails when: it needs to reason under physical constraints (gravity, geography, propagation dynamics). It generates plans that violate physics or model spread uniformly when geography, corridors, and barriers determine the actual pattern.

What it blocks: Physics violations. Spatial dynamics blindness. Assuming things spread equally in all directions when corridors and barriers shape the actual flow.

Example: An agent modeling supply chain disruption after a port closure. Without physical reasoning, it treats the disruption as uniform across all routes. With it, the agent maps geographic corridors (which alternative ports have rail connections, which routes cross mountain ranges, where container ships can physically dock) and identifies chokepoints where intervention has maximum effect.

4. Narrative Reasoning

Your agent fails when: it needs to maintain coherence across long output. Each paragraph reads well in isolation, but the overall document contradicts itself, drops threads, or drifts from its thesis. The agent optimizes for local fluency, not global structure.

What it blocks: Thread dropping. Theme drift. Structural amnesia: losing track of commitments made earlier in the text.

Example: An agent generating a 20-page analysis. Without narrative reasoning, section 14 subtly contradicts the thesis established in section 2, and an argument introduced in section 5 is never resolved. With it, the agent tracks its own discourse structure: which threads are open, which claims need evidence, whether the conclusion follows from the established argument.

5. Latent Intent

Your agent fails when: it does exactly what was asked, missing the actual need. "Make me a dashboard" gets a dashboard, but the latent intent was "my VP wants weekly metrics and I spend 4 hours compiling them manually," which changes every design decision.

What it blocks: Literal compliance. Executing the stated request without questioning whether it serves the actual goal. Intent projection: assuming the requester wants what the agent would want.

Example: An agent asked to "refactor the authentication module." Without latent intent detection, it restructures the code. With it, the agent probes: why now? Is the goal cleaner code, faster onboarding for new developers, or preparing for a security audit? Each answer changes which refactoring choices matter.

6. Predictive Reasoning

Your agent fails when: it needs to forecast future states with calibrated confidence. It produces the most narratively satisfying prediction instead of the most probable one. Base rates are ignored because they're boring. Confidence intervals are omitted. A single timeline is presented as inevitable.

What it blocks: Prediction theater: narrative replacing evidence. Single timeline bias. False precision. Base rate neglect.

Example: An agent asked to forecast customer churn after a pricing change. Without predictive reasoning, it generates one confident scenario ("churn will increase 15% in Q2"). With it, the agent branches: scenario A (elastic demand, +20% churn, 40% probability), scenario B (inelastic with delayed effect, +8% in Q3, 35% probability), scenario C (competitor absorption, +30% concentrated in enterprise tier, 25% probability). It anchors on base rates from similar pricing changes and flags where evidence is insufficient for any prediction.

What Stays the Same

The expansion changes what the API can reason about, not how you use it.

Same endpoint. POST /harness/: nothing changes.
Same modes. Dynamic (single ability) and Composite (four abilities) work identically.
Same injection protocol. The scaffold format, the delimiters, the suppress/amplify structure: all unchanged.
Automatic routing. New domains are retrieved the same way as existing ones. Your query determines which domain activates. No configuration, no domain selection.

Existing users won't change a line of code. The API gets smarter; the integration stays the same.

Why These Six

Each new domain was derived from observed production failures: cases where agents broke in ways the current six couldn't address. They're not aspirational categories. They're failure classes we've documented, analyzed, and can now target with the same structured suppression approach that works on the original six.

The litmus test is unchanged. Every new ability must pass the same triple gate:

Must be a cognitive operation: not domain knowledge
Must be LLM-executable: no external tools required
Must work across subjects: not tied to one industry or use case

No Timelines

We share direction, not deadlines. Domains ship when the abilities pass the litmus test, survive internal benchmarking, and demonstrate measurable improvement on production-representative tasks. The original 311 abilities took months of iteration, purification, and blind evaluation. The next set will meet the same standard.

What we can say: the research is active, the domain definitions are stable, and the failure modes are real. When a domain is ready, it appears in the API automatically. Your agent starts getting scaffolds it couldn't get before. No migration. No upgrade path. Just better reasoning.

For the current six dimensions and how they work: Concepts. For the theoretical framework: The Method. For benchmark evidence on the current abilities: Benchmarks.