// for veteran ownersby JoshMay 8, 20265 min read

After-Action Review for AI Implementations

Every military operation gets debriefed. Most AI implementations don't. The AAR format is the most useful weekly artifact you can produce. Here's the template I use.

After-Action Review for AI Implementations

After-Action Review is the most underrated military framework that applies to civilian work. Every meaningful operation gets one. Veterans run them by reflex.

Most AI implementations skip this entirely. They ship, they run, they break occasionally, and the team never sits down and asks: "What happened this week and what should we do about it?"

The veterans I work with run AARs. The civilians don't. The veteran-led implementations get materially better over time. The civilian implementations stay flat or degrade.

The AAR format

The standard military AAR has four questions:

1. What was supposed to happen? 2. What actually happened? 3. Why was there a difference? 4. What will we do about it?

That's it. Four questions. Done in 30-45 minutes for most operations. The format works because it's structured but not bureaucratic.

Applied to AI implementations

For a weekly AI implementation AAR:

1. What was supposed to happen? - The agents were supposed to handle X% of traffic without escalation - Response times were supposed to be under Y seconds - Cost per task was supposed to be under $Z - The new feature was supposed to ship by Tuesday

2. What actually happened? - 73% of traffic handled (vs 80% target) - Response times: p95 at 4.2 sec (vs 3 sec target) - Cost per task: $0.18 (vs $0.15 target) - New feature shipped Wednesday (one day late) - 3 incidents during the week (all minor)

3. Why was there a difference? - 7% gap on auto-handling: most of it traces to a new customer segment with question patterns we hadn't trained the agent on - Response time slip: the new feature added a context-loading step that's slower than the previous baseline - Cost variance: combination of slower model usage on hard cases (which is correct) and higher than expected token use on the new feature - Ship delay: integration testing surfaced an edge case that took a day to handle properly - Incidents: one was a Sentry alert that turned out to be a non-issue; two were real edge cases worth fixing

4. What will we do about it? - Add the new customer segment's question patterns to training set this week - Profile the context-loading step; aim to cut by 30% - Review token usage on the new feature; tune prompt - Document the integration testing pattern that caught the ship delay (it caught a real bug; the day was worth it) - Add monitoring for the two real edge cases so we catch them faster next time

Each bullet is an action. Not a vague aspiration. Someone owns it. There's a deadline.

Why this matters

AI implementations drift in ways that aren't visible day to day. The cost creeps up. The accuracy slowly degrades because the input distribution shifts. The latency increases because each new feature adds a tick.

Daily monitoring catches these only if someone's actively watching. Weekly AAR forces the question even when nothing's obviously wrong. "Are we still doing what we said we'd do?"

If yes, AAR is short. 15 minutes. Great, keep going.

If no, AAR surfaces the drift before it becomes a crisis.

Who attends

Smallest viable AAR: - The person responsible for the AI implementation (eng lead or owner) - A user-side representative (CSM, support, sales — whoever sees the customer impact) - A finance representative (if cost is a tracked dimension)

Three people. 45 minutes. Weekly cadence.

Larger orgs add more. Don't add more than necessary. The point is action, not theater.

The AAR isn't blame

Critical clarification: the AAR is about the operation, not the people. "What did we do wrong" isn't the question. "What happened" is. We look at decisions, not characters.

Most civilian retrospectives I've sat in feel like blame sessions. The room gets defensive. Real issues stay hidden because people don't want to admit them.

Military AARs work because everyone agrees ahead of time: we're looking at the operation. Not at each other. Stuff happens. We dig into why. We fix what's fixable. We move on.

If you're going to run AARs, set this tone explicitly. Otherwise the meeting becomes performative and you'll stop running them within a quarter.

The persistence problem

The hard part of AARs isn't the meeting. It's that the actions from one AAR get carried into the next one. Each action gets a status: done / in progress / dropped (with reason).

Without this persistence, you do the AAR, write the actions, and then never look at them again. Civilians do this constantly. Veterans don't — they have it drilled in that actions live until they're closed.

Use a shared doc. The same doc each week. Carry over open actions. Close them as they get done. Drop them with explicit reasoning if they're no longer relevant.

This is the discipline that makes AARs compound. Each week's review benefits from the last week's actions. The implementation gets meaningfully better month over month.

A specific result

I introduced weekly AARs at a client with a struggling AI implementation. Before AARs, the team had been at the same accuracy rate for 4 months. The same incidents kept recurring.

Six weeks after starting AARs: - Accuracy up 11% - Incident recurrence down 70% - Cost per task down 22% - Team morale up (subjective, but measurable in how often the team mentioned "stuck" in standup)

Nothing technical changed in those six weeks. The AARs made visible what had been invisible. Visibility produced action. Action produced results.

What civilians do instead

The civilian default is a quarterly review where someone makes a deck about how the AI is doing. The deck is generally positive because nobody wants to write a negative deck for the board.

This is theater. It doesn't catch drift. It doesn't surface action. It just creates a document.

The veteran default is the weekly AAR. Smaller cadence. Higher utility. Less performative.

What to do if you're a veteran owner

Schedule the AAR right now. Weekly. 45 minutes. Same time, same people, same format.

Make sure the four questions are the structure. Don't let it become a status update meeting. Don't let it become a blame session.

Write the actions down. Review them next week.

Keep doing it. The compound benefit shows up around week 6-8.

The bottom line

AAR is the most useful military framework for civilian AI work. Veterans use it by reflex. Civilians have to be taught.

If you're a veteran considering AI for your business, this is one of your biggest unfair advantages. The discipline of weekly debrief is already in you. Apply it.

The implementation will be visibly better in a quarter than the civilian competitor's. Same technology. Different operational rigor. Different outcome.

aarveteranaireviewoperations
// go deeper

Want the full guide? Check out our deep-dive page for more context, FAQs, and resources.

read the full guide
// keep reading

Related posts

// ready to ship?

Let's build yours.

Reading is the easy part. We do the work. Tell us what's broken and we'll tell you straight up whether we can help.