Example regressions.
Caught before production.
credit_analyst.pyProblemAfter a prompt edit, the regulatory compliance check was silently dropped from the execution path.
CaughtSpooled detected the missing tool (regulatory_check) via tool_jaccard similarity drop to 0.88 and flagged it as a VARIANT before merge.
new_side_effectscode_reviewer.pyProblemA model downgrade from gpt-4o to gpt-4o-mini caused the security scan step to be skipped entirely.
CaughtThe run_security_scan tool was missing from the execution fingerprint. Policy rule blocked the PR with on_variant under fail_if.
new_side_effectscompliant_rag_pipeline.pyProblemCompliance guardrails (PII check + hallucination detection) were dropped during a refactor.
CaughtPolicy enforced required_tools: [check_pii, detect_hallucination]. Missing tools triggered an automatic merge block.
required_tools (policy)data_pipeline_monitor.pyProblemA cross-reference tool started being called twice, subtly changing the execution shape over two weeks.
CaughtSequence similarity (LCS) drifted from 1.00 to 0.91 over 10 runs. Spooled's sparkline trend caught the gradual regression.
tool_usage_changesalert_triage_agent.pyProblemAcross three versions (v1→v2→v3), the correlate_incidents step was removed in a 'cleanup' PR.
CaughtThe v3 fingerprint was classified as VARIANT against v2 baseline. Missing critical correlation step flagged and blocked.
new_behavior_patternorchestrator.pyProblemA child agent spawned by the orchestrator changed its behavior, but the parent agent's tests all passed.
CaughtSpooled correlates traces via session_id. The child agent's fingerprint change was isolated and flagged independently.
new_behavior_patterndeal_agent.pyProblemThe model provider upgraded gpt-4o-mini silently. The agent stopped calling sanctions_screening on international deals — dropping a critical compliance step with no code change.
CaughtProduction fingerprints showed tool_jaccard drop from 1.0 to 0.67 and 3 tools missing from the execution graph. Policy blocked the next deploy within minutes of the model update.
new_side_effectssupport_bot.pyProblemAfter upgrading from a legacy model to a reasoning model, prompt injection resistance dropped from 94% to 71%. No test failures — the model was more capable overall, just less safe.
CaughtThe execution fingerprint changed shape: the agent began following injected instructions that it previously refused. Spooled flagged the new tool call patterns as a VARIANT before the change reached production.
new_behavior_pattern