Back to blog
Operations

An Agent Incident Response Playbook You Can Actually Run

When an AI agent misbehaves, the first question should not be "what did the model think?" It should be "how do we stop the impact from spreading?"

Security & ReliabilityJanuary 22, 20269 min read

Agent incidents are uncomfortable because they blur categories. A bad output can be a support problem, a product bug, a policy miss, a data issue, a tool failure, or a security concern. If the team does not have a playbook, the first hour becomes a debate about what kind of incident it is.

The playbook does not need to be complicated. It needs to be usable while people are stressed: define severity, contain the impact, preserve evidence, identify the failure point, ship a verified correction, and write down what will prevent a repeat.

Detect

Bad output reported by user

Contain

Pause workflow API and require approval

Investigate

Collect run ID, prompt, tool output

Correct

Patch, evaluate, and rollout safely

Incident response should move from containment to evidence to correction, not straight to prompt editing.

Define severity by impact

Do not define severity by how embarrassing the output looks in a screenshot. Define it by user impact, business impact, reversibility, and regulatory or security exposure. A silly answer in a sandbox is not the same as an incorrect refund action. A hallucinated citation in a draft is not the same as a policy-breaking message sent to a customer.

  • Low severity: incorrect or low-quality output caught before user impact.
  • Medium severity: user-visible issue with limited impact and clear correction path.
  • High severity: financial, legal, security, privacy, or safety-sensitive impact.
  • Critical severity: active broad impact or irreversible actions requiring immediate shutdown.

Contain before you diagnose

The instinct during an AI incident is to start reading transcripts and editing prompts. Resist it. First reduce blast radius. Pause the workflow API if needed. Disable write tools. Force manual approval. Roll back to the last known-good prompt or workflow version. Remove a bad knowledge source from retrieval. The correction can wait until the impact is contained.

Containment actions should be pre-approved. Nobody should need a committee to pause a risky workflow during a high-severity incident.

Preserve the run evidence

A useful investigation needs the run ID, user input, agent version, workflow path, prompt version, retrieved sources, tool requests and responses, approval decisions, final output, and user report. Without that evidence, the team may fix the visible symptom while missing the real cause.

For example, a hallucinated answer might be a prompt problem. It might also be a retrieval problem because the correct source was never returned. A wrong account update might be a tool permission problem. A missing escalation might be a routing policy problem. The trace tells the difference.

The postmortem should create tests

An agent postmortem that ends with "improved the prompt" is incomplete. The incident should produce a regression case, a clearer owner, and a verification check. If the workflow failed to escalate a legal-sensitive request, add an eval case. If a tool accepted incomplete input, add validation. If a reviewer did not see enough context, change the approval view.

Trumpets keeps incident response grounded in versions, logs, workflow controls, and evaluation loops because the fix should be traceable. The goal is not to prove the model will never make a mistake. The goal is to make each mistake containable, explainable, and less likely to happen again.

All posts