Troubleshooting

Baseline keeps mismatching

Common causes

  • Model updates — a provider-side model update can change the tool sequence. Regenerate the baseline.
  • Non-deterministic ordering — if tools run concurrently, their order may vary. Use SPOOLED_FINGERPRINT_MODE=structural for order-insensitive hashing.
  • Temperature > 0 — high temperature can cause different tool choices. Use deterministic test inputs with temperature=0 for CI.

Hash chain verification fails

  • Manual trace edits — editing a JSONL trace file breaks the chain. Don't modify trace files.
  • Partial writes — if the agent crashes mid-run, the chain may be incomplete. Re-run the agent.
  • Clock skew — timestamps are used in hash computation. Ensure system clocks are synchronized.

CI is blocking unexpectedly

  • Check which policy rule triggered: look at the CI report for violations
  • Check if on_variant: true is set under fail_if: with block_merges: true — this blocks any behavioral change
  • Check signal thresholds — latency/token spikes may trigger on natural variance. Use --retries 2 for 2-pass verification.
  • Accept intentional variants: spooled ci accept-variant --intent ... --fingerprint ... --baseline ...

Low Spooled Score

The score has four components. Check which one is low:

  • Structural (40%) — fingerprint doesn't match. Check if the tool sequence changed.
  • Signal (25%) — signals detected. Check which signals fired and adjust thresholds.
  • Metric (20%) — latency or tokens changed. This may be natural variance.
  • Trend (15%) — recent history is inconsistent. Run more times to stabilize.

Empty or missing traces

  • Missing shutdown() — always call spooled.shutdown() or use the @spooled.trace decorator which handles it automatically.
  • Unsupported library — check that your LLM/tool library is in the auto-instrumented list. If not, use manual record_interaction() calls.
  • Import order — call spooled.init() before importing/using your LLM client so hooks are installed.
  • Sample rate — check SPOOLED_SAMPLE_RATE. A value of 0.0 records nothing.

Performance overhead

Spooled adds approximately:

  • ~1–2ms per interaction recording
  • 50–100KB memory for the recorder buffer
  • Background flush thread (5-second intervals)

The SDK fails open — if recording fails, your agent continues normally.

Debug mode

spooled --debug ci run --suite tests/ --baseline baselines/

Enables verbose logging for diagnosing issues.