Troubleshooting
Baseline keeps mismatching
Common causes
- Model updates — a provider-side model update can change the tool sequence. Regenerate the baseline.
- Non-deterministic ordering — if tools run concurrently, their order may vary. Use
SPOOLED_FINGERPRINT_MODE=structuralfor order-insensitive hashing. - Temperature > 0 — high temperature can cause different tool choices. Use deterministic test inputs with temperature=0 for CI.
Hash chain verification fails
- Manual trace edits — editing a JSONL trace file breaks the chain. Don't modify trace files.
- Partial writes — if the agent crashes mid-run, the chain may be incomplete. Re-run the agent.
- Clock skew — timestamps are used in hash computation. Ensure system clocks are synchronized.
CI is blocking unexpectedly
- Check which policy rule triggered: look at the CI report for
violations - Check if
on_variant: trueis set underfail_if:withblock_merges: true— this blocks any behavioral change - Check signal thresholds — latency/token spikes may trigger on natural variance. Use
--retries 2for 2-pass verification. - Accept intentional variants:
spooled ci accept-variant --intent ... --fingerprint ... --baseline ...
Low Spooled Score
The score has four components. Check which one is low:
- Structural (40%) — fingerprint doesn't match. Check if the tool sequence changed.
- Signal (25%) — signals detected. Check which signals fired and adjust thresholds.
- Metric (20%) — latency or tokens changed. This may be natural variance.
- Trend (15%) — recent history is inconsistent. Run more times to stabilize.
Empty or missing traces
- Missing shutdown() — always call
spooled.shutdown()or use the@spooled.tracedecorator which handles it automatically. - Unsupported library — check that your LLM/tool library is in the auto-instrumented list. If not, use manual
record_interaction()calls. - Import order — call
spooled.init()before importing/using your LLM client so hooks are installed. - Sample rate — check
SPOOLED_SAMPLE_RATE. A value of 0.0 records nothing.
Performance overhead
Spooled adds approximately:
- ~1–2ms per interaction recording
- 50–100KB memory for the recorder buffer
- Background flush thread (5-second intervals)
The SDK fails open — if recording fails, your agent continues normally.
Debug mode
spooled --debug ci run --suite tests/ --baseline baselines/
Enables verbose logging for diagnosing issues.