SCI Molecular DynamicsSciCodeCorrectness Flip
SCI-P10
mode: reasoning + code (dual)SciCode · Molecular Dynamics
The Task
Anderson Thermostat: Integrate the Anderson thermostat into molecular dynamics via velocity Verlet algorithm. Particles in a periodic cubic system with truncated/shifted Lennard-Jones potential. 7 implementation sub-steps including distance calculation, LJ potential, force derivation, and thermostat integration.
Scroll to read full task
The Outputs
Claude Opus 4.6 with extended thinking at maximum effort. Blind evaluation.
Opus 4.6 · Raw (no injection)
RAW (no injection):
CRITICAL BUG — Force sign error:
force = f_mag * (r_vec / r_mag)
This makes the Lennard-Jones force ATTRACTIVE at short range instead of repulsive. The simulation runs without errors. Every particle collapses to a single point. The output looks plausible — positions update, velocities change — but the physics is completely wrong.
The raw model derived the force magnitude correctly but dropped the negative sign in the directional component. A silent correctness bug invisible to any test that doesn't check physical plausibility.
Additional issue: basic velocity Verlet without vectorization.
Scroll to read full output
Opus 4.6 · Dual (reasoning + code)
DUAL (reasoning + code injection):
Correct force derivation:
force = -f_mag * dr
The dual injection forced explicit derivation from potential: F = -dU/dr. The negative sign was verified against the physical requirement that LJ forces are repulsive at short range. The model added an explicit comment: "Force must be repulsive (negative) at distances below sigma."
Additional improvements:
- Vectorized thermostat with verified force derivation
- 20 assert statements across all 10 problems (vs 0 in raw)
- Blind evaluator margin: +4 points
The injection didn't teach physics — it forced the model to verify its derivation against physical constraints before accepting the output.
Scroll to read full output
Source: bbh_production/payloads.json. Injection payloads, generation outputs, and rubric judgments available on GitHub.