// ai postmortemsby JoshApril 20, 20265 min read

Why Our First Agent Ran Up a $9,200 OpenAI Bill in 4 Hours

An agent went into a loop. Each iteration called the API. By the time we caught it, the bill was $9,200. Here's exactly what happened and the six controls that prevent it.

Why Our First Agent Ran Up a $9,200 OpenAI Bill in 4 Hours

An agent ran from 1 AM to 5 AM unchecked. By the time my morning alert fired, the OpenAI dashboard showed $9,200 in usage from a single agent instance.

The fix took 15 minutes. The lesson took longer.

What happened

The agent was a research assistant. It would take a question, decompose it into sub-questions, search for answers, synthesize, and return a comprehensive report.

We'd deployed it the night before. Production traffic was 12 reports a day. Each report ran ~$0.40. We expected $5/day in API spend.

At 1 AM a user submitted a question that triggered an edge case. The agent decomposed the question into 7 sub-questions. Each sub-question's answer made the agent uncertain enough to decompose THAT into 5 more sub-questions. The agent kept recursing.

By 2 AM the agent was making about 200 API calls per minute. Each call cost about $0.10. The math compounds fast.

The runaway continued until I woke up at 5 AM, saw a billing alert email I'd missed, and killed the process.

Root cause

The agent had no recursion limit. It had no per-task cost ceiling. It had no rate limit. The retry logic on transient errors would re-fire the same expensive call.

When I built it, I trusted that the agent's natural stopping condition (confidence threshold met, answer assembled) would kick in. I didn't anticipate "the answer is never assembled because every sub-question spawns more."

What we built instead

Six layers of protection now go into every agent. Skipping any of them is malpractice.

1. Per-task budget cap. Each task has a hard dollar cap. If the cap is hit, the agent aborts with a "budget exhausted" status and returns whatever partial result it has.

2. Recursion depth limit. Agents are not allowed to call themselves beyond N levels deep. Default is 3. Exceptions require explicit configuration.

3. Time limit per task. Each task has a hard time cap. If the task runs longer than the cap, it's killed.

4. Rate limit per agent instance. No agent instance can make more than M API calls per minute. Default is 30.

5. Cost-rate alarm. If any agent instance starts costing more than $X/hour, an automatic alarm fires AND the agent is paused pending human review.

6. Billing alert at the account level. Daily and weekly cost alarms from the provider (OpenAI, Anthropic, etc.). These are the last line of defense.

All six together cost about 200 lines of code in our agent runtime. They have caught at least 3 potential runaways since.

What I tell prospects now

Building agents without these six layers is operating without insurance. The day you don't need them, you'll save 200 lines of code. The day you do need them, they save your business.

For early-stage / prototype agents, layers 1, 3, and 6 are the minimum. Per-task budget cap, time limit, billing alert. Build them in from the first deploy.

For production agents touching real money, all six. No exceptions.

The lesson

LLM bills can compound exponentially. Recursion + retries + no limits = a system that can burn through a year's budget in an evening.

Treat your AI agent runtime like financial software. Apply the same paranoia you'd apply to a payment processor. Limits, alarms, kill switches, audit trails.

The cost of paranoia is small. The cost of one runaway is large.

The thing nobody mentions

The user whose question triggered the runaway was a paying customer. They asked a legitimate question. The agent's failure was not their fault.

When I rebuilt the agent, I made sure that the protection layers didn't degrade the experience for legitimate users. Layer 1 (budget cap) is high enough that real users hit it never. Layer 4 (rate limit) is set per-instance, so other users aren't affected when one user's task is rate-limited.

Protection layers are not user-experience reductions. Done right, they catch the edge cases without changing what legitimate users experience.

If your safety controls degrade real users, you've built them wrong. They should only fire on edge cases.

What the bill turned into

OpenAI's support team agreed to credit half the bill ($4,600) because the spike was clearly anomalous. We paid the other half.

The $4,600 was tuition for the controls. I have not had another runaway in 14 months. The controls have paid for themselves about 3x in caught events.

If you're building agents and reading this, the $4,600 is yours to NOT pay. Build the controls.

postmortemai agentcost runawaycontrolsfailure
// go deeper

Want the full guide? Check out our deep-dive page for more context, FAQs, and resources.

read the full guide
// keep reading

Related posts

// ready to ship?

Let's build yours.

Reading is the easy part. We do the work. Tell us what's broken and we'll tell you straight up whether we can help.