An Agent Incident Response Playbook You Can Actually Run
When an AI agent misbehaves, the first question should not be "what did the model think?" It should be "how do we stop the impact from spreading?"
Agent incidents are uncomfortable because they blur categories. A bad output can be a support problem, a product bug, a policy miss, a data issue, a tool failure, or a security concern. If the team does not have a playbook, the first hour becomes a debate about what kind of incident it is.
The playbook does not need to be complicated. It needs to be usable while people are stressed: define severity, contain the impact, preserve evidence, identify the failure point, ship a verified correction, and write down what will prevent a repeat.
Detect
Bad output reported by user
Contain
Pause workflow API and require approval
Investigate
Collect run ID, prompt, tool output
Correct
Patch, evaluate, and rollout safely
Define severity by impact
Do not define severity by how embarrassing the output looks in a screenshot. Define it by user impact, business impact, reversibility, and regulatory or security exposure. A silly answer in a sandbox is not the same as an incorrect refund action. A hallucinated citation in a draft is not the same as a policy-breaking message sent to a customer.
- Low severity: incorrect or low-quality output caught before user impact.
- Medium severity: user-visible issue with limited impact and clear correction path.
- High severity: financial, legal, security, privacy, or safety-sensitive impact.
- Critical severity: active broad impact or irreversible actions requiring immediate shutdown.
Contain before you diagnose
The instinct during an AI incident is to start reading transcripts and editing prompts. Resist it. First reduce blast radius. Pause the workflow API if needed. Disable write tools. Force manual approval. Roll back to the last known-good prompt or workflow version. Remove a bad knowledge source from retrieval. The correction can wait until the impact is contained.
Containment actions should be pre-approved. Nobody should need a committee to pause a risky workflow during a high-severity incident.
Preserve the run evidence
A useful investigation needs the run ID, user input, agent version, workflow path, prompt version, retrieved sources, tool requests and responses, approval decisions, final output, and user report. Without that evidence, the team may fix the visible symptom while missing the real cause.
For example, a hallucinated answer might be a prompt problem. It might also be a retrieval problem because the correct source was never returned. A wrong account update might be a tool permission problem. A missing escalation might be a routing policy problem. The trace tells the difference.
The postmortem should create tests
An agent postmortem that ends with "improved the prompt" is incomplete. The incident should produce a regression case, a clearer owner, and a verification check. If the workflow failed to escalate a legal-sensitive request, add an eval case. If a tool accepted incomplete input, add validation. If a reviewer did not see enough context, change the approval view.
Trumpets keeps incident response grounded in versions, logs, workflow controls, and evaluation loops because the fix should be traceable. The goal is not to prove the model will never make a mistake. The goal is to make each mistake containable, explainable, and less likely to happen again.