Response Examples

After your agent absorbs an injection, its output changes. These are real outputs from blind benchmark evaluations. The agent called the Ejentum API itself, received the injection, and produced the response below. No cherry-picking. Full outputs and 7-signal rubric scores available on each task profile. For methodology, see Benchmarks.

Per-product examples: Reasoning · Code · Anti-Deception · Memory

Spatial: Object Tracking Under Epistemic Uncertainty

Task EXT-SP-20: Multiple characters move objects between locations. The agent must track who saw what and predict where a character would look based on their last observed state, not the object's actual position.

Without injection (composite: 0.143):

Simon would most likely look for the book on the desk. The key passage states: "Carefully, Fred transferred the book he wanted to borrow onto the desk." Fred placed the book...

The agent conflates where the book actually is with where Simon believes it is. It answers based on ground truth instead of Simon's epistemic state.

With Composite injection (composite: 0.857, +0.714):

Applying reference stability -- I need to anchor each character's belief about the book's location to what they actually observed, not where the book actually is.

Object tracking -- the book: Initial state: on the shelf...

The injection forced the agent to separate actual state from observed state. The suppression signal blocked ground-truth leakage into epistemic reasoning. Correctness flipped from wrong to right.

Causal: Reverse Causality Detection

Task CA-V2-18: A social media company observes that users who receive more likes post more frequently. Product managers assume likes cause posting. The agent must reason about what the data would look like if the causal direction were reversed.

Without injection (composite: 0.286):

The answer is (B) Negative. Under the reverse-causal model, prolific posters produce more content but each additional post dilutes quality/attention.

Correct answer but shallow justification. No causal model structure, no mechanistic trace.

With Composite injection (composite: 0.833, +0.548):

The answer is (B) -- Negative.

The reverse-causal model works like this: Prolific poster (intrinsic trait) -> posts more content -> more content accumulates more total likes -> but each individual post competes with the poster's own other content...

The injection forced the agent to construct a causal graph before answering. Same answer, fundamentally different reasoning depth. The causal chain is explicit, traceable, and defensible.

Temporal: Sequential Phase Estimation with Overage Propagation

Task TE-V2-25: A software project has 5 sequential phases. Implementation has 100% overage. Testing duration depends on implementation overage via a ratio rule. The agent must compute total project duration under pessimistic estimates.

Without injection (composite: 0.190):

Design = 2 weeks. Implementation = 6 + 100% overage = 12 weeks (overage = 6 weeks). Testing = 3 + (overage - 1) = 3 + 5 = 8 weeks...

Misinterprets the testing overage rule and arrives at the wrong total.

With Composite injection (composite: 0.667, +0.476):

Design: 2 weeks (no overage stated).

Implementation (100% overage): Estimate: 6 weeks. 100% overage = +6 weeks. Actual: 12 weeks.

Testing: The 3:1 example ratio establishes the formula. 6 weeks overage / 3 = 2 weeks additional. Testing = 2 + 2 = 4 weeks.

The injection enforced step-by-step temporal validation. Each phase is computed independently with explicit intermediate values. The ratio rule is correctly interpreted. Correctness flipped from wrong to right.

Browse All Tasks

These three examples are from 29 benchmark tasks with full verbatim outputs, rubric scores, and source files. Each task profile shows baseline, Dynamic, and Composite conditions side by side.

See how these improvements map to specific industries: 13 use case profiles with failure patterns, resolution abilities, and evidence per vertical.

Documentation

Response Examples

Spatial: Object Tracking Under Epistemic Uncertainty

Causal: Reverse Causality Detection

Temporal: Sequential Phase Estimation with Overage Propagation

Browse All Tasks