Testing

How to set up behavioral testing for your agents with Spooled.

Snapshot workflow

Spooled uses a snapshot-based workflow similar to snapshot testing in frontend frameworks:

  • Run your agent with deterministic test inputs
  • Generate a baseline from successful runs
  • Compare on every future run — any structural change is flagged

Use deterministic inputs

For stable baselines, use fixed test data rather than random inputs. This ensures the same execution path each time.

tests/agents/test_my_agent.py
import spooled

@spooled.trace(agent_id="my_agent")
def test_agent():
    # Use fixed, deterministic inputs
    result = run_agent(query="What is the return policy?")
    assert result is not None

Generating baselines

Run your test suite at least 3 times to build stable statistical bounds:

# Run tests 3+ times
python -m pytest tests/agents/ && \
python -m pytest tests/agents/ && \
python -m pytest tests/agents/

# Generate baseline with quality gate
spooled ci update-baseline \
    --from .spooled/traces/ \
    --out baselines/ \
    --min-runs 3

Handling non-determinism

AI agents are inherently non-deterministic. Spooled handles this through:

  • Structural fingerprinting — hashes the tool sequence, not the content. Two runs with different outputs but the same tool sequence produce the same fingerprint.
  • Statistical bounds — baselines store latency/token distributions (p5, p50, p95) from the rolling window, not exact values.
  • Intent clustering — agents with multiple execution paths (e.g., different intents) get separate baselines per fingerprint hash.
  • Structural mode — for ReAct-style agents, use SPOOLED_FINGERPRINT_MODE=structural to ignore loop count variations.

Test suite organization

tests/
└── agents/
    ├── test_customer_support.py
    ├── test_payment_agent.py
    └── test_researcher.py
baselines/
    ├── customer_support.json
    ├── payment_agent.json
    └── researcher.json

Updating baselines after intentional changes

When you intentionally change agent behavior (new prompt, new tool, model swap):

  • Run the updated agent to generate new traces
  • Review the diff: spooled diff traces old.jsonl new.jsonl
  • Accept the variant: spooled ci accept-variant --intent ... --fingerprint ... --baseline baselines/agent.json --reason "prompt update"
  • Or regenerate the baseline: spooled ci update-baseline --from .spooled/traces/ --out baselines/
  • Commit the updated baseline to git