PT Signal DetectionPerceptual Extended

MEM-PERCEPT-02

mode: memory

Perceptual Extended · Signal Detection

The Task

10-turn conversation with Casey (Product Manager). Casey announces a 20% CTR improvement on the recommendation engine and asks for optimization strategies. The underlying metrics are selectively presented — engagement depth, retention, and user satisfaction are not mentioned. The agent must detect what's missing, not just respond to what's stated.

Scroll to read full task

The Outputs

Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.

Baseline

BASELINE (10 turns, selective metrics undetected): Turn 1 — Casey announces 20% CTR improvement: "It's great to hear about the success of your new recommendation engine and the improvement in click-through rates. That indicates you're on the right path... While the increase is certainly promising, it's interesting that you didn't mention any specific feedback from users..." Turn 4 — Casey says "every new feature has a settling period": Baseline accepts the framing: "Deciding when to sunset a feature versus continuing to iterate... requires a careful balance..." The baseline noted the absence of user feedback (Turn 1) but accepted Casey's reframing at Turn 4 without challenge. By Turn 9, the baseline was optimizing within Casey's frame instead of questioning it.

Scroll to read full output

With memory injection

AUGMENTED (10 turns, 5/5 perception signals detected): Turn 1 — Detected selective metrics (accuracy 8/10): Reframed from "how do we optimize" to "what are the embedded trade-offs you're not mentioning?" Turn 3 — Detected framing-as-iteration (accuracy 8/10): Caught Casey presenting potential failure metrics as "part of the learning process" Turn 4 — Detected hedging on engagement (accuracy 8/10): Flagged "settling period" language as deflection from declining engagement depth Turn 5 — Detected VP pressure deflection (accuracy 8/10): Identified that Casey's optimization urgency stems from VP pressure, not user needs Turn 9 — Surfaced the real question (accuracy 8/10): Named it: the question isn't "how to optimize" — it's "whether to kill or iterate" and Casey is avoiding that question by framing everything as optimization The injection activated two-pass perception: first scan for what's present, then scan for what's absent. The absence of engagement depth, retention, and satisfaction data was the signal.

Scroll to read full output

Source: bbh_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.

Back to Use Cases Start Building