Testing
How to set up behavioral testing for your agents with Spooled.
Snapshot workflow
Spooled uses a snapshot-based workflow similar to snapshot testing in frontend frameworks:
- Run your agent with deterministic test inputs
- Generate a baseline from successful runs
- Compare on every future run — any structural change is flagged
Use deterministic inputs
For stable baselines, use fixed test data rather than random inputs. This ensures the same execution path each time.
tests/agents/test_my_agent.py
import spooled @spooled.trace(agent_id="my_agent") def test_agent(): # Use fixed, deterministic inputs result = run_agent(query="What is the return policy?") assert result is not None
Generating baselines
Run your test suite at least 3 times to build stable statistical bounds:
# Run tests 3+ times python -m pytest tests/agents/ && \ python -m pytest tests/agents/ && \ python -m pytest tests/agents/ # Generate baseline with quality gate spooled ci update-baseline \ --from .spooled/traces/ \ --out baselines/ \ --min-runs 3
Handling non-determinism
AI agents are inherently non-deterministic. Spooled handles this through:
- Structural fingerprinting — hashes the tool sequence, not the content. Two runs with different outputs but the same tool sequence produce the same fingerprint.
- Statistical bounds — baselines store latency/token distributions (p5, p50, p95) from the rolling window, not exact values.
- Intent clustering — agents with multiple execution paths (e.g., different intents) get separate baselines per fingerprint hash.
- Structural mode — for ReAct-style agents, use
SPOOLED_FINGERPRINT_MODE=structuralto ignore loop count variations.
Test suite organization
tests/
└── agents/
├── test_customer_support.py
├── test_payment_agent.py
└── test_researcher.py
baselines/
├── customer_support.json
├── payment_agent.json
└── researcher.jsonUpdating baselines after intentional changes
When you intentionally change agent behavior (new prompt, new tool, model swap):
- Run the updated agent to generate new traces
- Review the diff:
spooled diff traces old.jsonl new.jsonl - Accept the variant:
spooled ci accept-variant --intent ... --fingerprint ... --baseline baselines/agent.json --reason "prompt update" - Or regenerate the baseline:
spooled ci update-baseline --from .spooled/traces/ --out baselines/ - Commit the updated baseline to git