Troubleshooting

Baseline keeps mismatching

Common causes

Model updates — a provider-side model update can change the tool sequence. Regenerate the baseline.
Non-deterministic ordering — if tools run concurrently, their order may vary. Use SPOOLED_FINGERPRINT_MODE=structural for order-insensitive hashing.
Temperature > 0 — high temperature can cause different tool choices. Use deterministic test inputs with temperature=0 for CI.

Hash chain verification fails

Manual trace edits — editing a JSONL trace file breaks the chain. Don't modify trace files.
Partial writes — if the agent crashes mid-run, the chain may be incomplete. Re-run the agent.
Clock skew — timestamps are used in hash computation. Ensure system clocks are synchronized.

CI is blocking unexpectedly

Check which policy rule triggered: look at the CI report for violations
Check if on_variant: true is set under fail_if: with block_merges: true — this blocks any behavioral change
Check signal thresholds — latency/token spikes may trigger on natural variance. Use --retries 2 for 2-pass verification.
Accept intentional variants: spooled ci accept-variant --intent ... --fingerprint ... --baseline ...

Low Spooled Score

The score has four components. Check which one is low:

Structural (40%) — fingerprint doesn't match. Check if the tool sequence changed.
Signal (25%) — signals detected. Check which signals fired and adjust thresholds.
Metric (20%) — latency or tokens changed. This may be natural variance.
Trend (15%) — recent history is inconsistent. Run more times to stabilize.

Empty or missing traces

Missing shutdown() — always call spooled.shutdown() or use the @spooled.trace decorator which handles it automatically.
Unsupported library — check that your LLM/tool library is in the auto-instrumented list. If not, use manual record_interaction() calls.
Import order — call spooled.init() before importing/using your LLM client so hooks are installed.
Sample rate — check SPOOLED_SAMPLE_RATE. A value of 0.0 records nothing.

Performance overhead

Spooled adds approximately:

~1–2ms per interaction recording
50–100KB memory for the recorder buffer
Background flush thread (5-second intervals)

The SDK fails open — if recording fails, your agent continues normally.

Debug mode

spooled --debug ci run --suite tests/ --baseline baselines/

Enables verbose logging for diagnosing issues.