A $47,283 LLM invoice from a runaway agent loop. Soft alerts were configured — they fired on day 19. Human response took 36 hours. By the time someone killed the agent, the bill was $47k.
A budget alert that fired a webhook and auto-blocked new requests would have stopped it at $5,000. The entire damage was the 36-hour gap between "alert fired" and "human responded."
This guide gives you three options for LLM budget alerts, ordered from simplest to most powerful, so you can have something working in 10 minutes.
Key Takeaways
- Without budget alerts, the average team discovers a cost spike 15–30 days after it starts — when the invoice arrives
- Webhook alerts trigger automated responses (model downgrade, block) in seconds; email alerts wait for a human
- Provider-native alerts are zero-code but global-only — no per-feature or per-tenant granularity
- A proxy-layer alert with Redis counters fires in real time and supports per-tenant, per-feature, and per-model thresholds
This post is part of our SaaS AI Features Cost Guide.
Why Budget Alerts Fail (And What to Do About It)
Most teams think they have budget alerting because they set a monthly spend limit in OpenAI's dashboard. That's not alerting — that's a hard stop after the fact.
Provider-native limits have three problems:
- They fire once, at 100% — no warning at 70% or 90%
- They apply to your entire account, not per feature or per tenant
- They block all traffic when hit, not just the over-budget feature
A real budget alert system has three tiers:
| Threshold | Action |
|---|---|
| 70% of monthly budget | Send notification (email/Slack/webhook) — informational |
| 90% of monthly budget | Downgrade to cheaper model — automated, no human needed |
| 100% of monthly budget | Block requests or return graceful error — automated |
The 90% tier is the most important: it cuts costs automatically without degrading service to the point of user-visible failure.
Option 1: Provider-Native Alerts (Zero Code, 5 Minutes)
Both OpenAI and Anthropic have built-in notification settings. This is your starting point — not your ending point.
OpenAI:
- Go to platform.openai.com/settings/organization/billing
- Under "Usage limits", set a monthly budget cap
- Enable email notifications — OpenAI sends alerts when you reach 75% and 100%
Anthropic:
- Go to console.anthropic.com → Billing → Usage limits
- Set a monthly spend limit
- Enable email notifications at your configured thresholds
Limitations: These are account-wide limits. No per-feature, per-tenant, or per-model granularity. No automated response — just an email. Good as a safety net; not sufficient as your primary alerting layer.
Option 2: Webhook Alerts via a Proxy Layer (30 Minutes, Recommended)
A proxy layer like Tokonomics fires webhook alerts in real time based on configurable thresholds — per tenant, per feature, or globally. The webhook payload includes the threshold that was hit, the current spend, and the tenant or feature that triggered it.
Setup:
- Point your app at the proxy base URL instead of the provider
- Add your spending thresholds in the proxy dashboard
- Configure your webhook endpoint URL
What your webhook receives:
{
"event": "budget_threshold_hit",
"threshold_percent": 80,
"tenant_id": "tenant_abc123",
"feature_name": "support-bot",
"current_spend_usd": 40.00,
"monthly_budget_usd": 50.00,
"period": "2026-06",
"timestamp": "2026-06-15T14:23:11Z"
}
What your webhook handler should do:
@app.route('/llm-budget-alert', methods=['POST'])
def handle_budget_alert():
data = request.json
threshold = data['threshold_percent']
tenant_id = data['tenant_id']
if threshold >= 90:
# Auto-downgrade this tenant to cheaper model
db.execute(
"UPDATE tenants SET llm_model='gpt-4o-mini' WHERE id=?",
[tenant_id]
)
# Notify tenant
send_email(tenant_id, "Your AI usage is approaching the monthly limit")
elif threshold >= 70:
# Alert your team
post_to_slack(f"⚠️ Tenant {tenant_id}: {threshold}% of AI budget used")
return '', 200
The 90% tier is the critical one: it auto-responds without human intervention, which is the only reliable way to prevent the alert→response gap.
Option 3: Custom Redis-Backed Alerts (1–2 Hours, Maximum Control)
If you're not using a proxy layer, you can build custom alerting directly with Redis counters. This works for teams with a shared infrastructure layer who want full control.
The pattern:
import redis
r = redis.Redis()
def track_and_alert(tenant_id, cost_usd, monthly_budget):
key = f"spend:{tenant_id}:{get_current_month()}"
new_total = r.incrbyfloat(key, cost_usd)
r.expire(key, 2592000) # 30-day TTL
pct = (new_total / monthly_budget) * 100
if 70 <= pct < 80 and not already_alerted(tenant_id, 70):
fire_alert(tenant_id, pct, "warning")
mark_alerted(tenant_id, 70)
elif 90 <= pct < 100 and not already_alerted(tenant_id, 90):
fire_alert(tenant_id, pct, "critical")
downgrade_model(tenant_id)
mark_alerted(tenant_id, 90)
elif pct >= 100:
return "BLOCK"
return "ALLOW"
The mark_alerted call prevents alert storms — you only fire once per threshold crossing per billing period.
Alert Delivery: Email vs Webhook vs Slack
| Channel | Response time | Best for |
|---|---|---|
| 15 min – 2 hours | Informational 70% alerts, weekly summaries | |
| Slack/Teams | 2–10 minutes | Team awareness, non-critical alerts |
| Webhook | Seconds | Automated responses (downgrade, block) |
| PagerDuty | 1–5 minutes | On-call escalation for critical cost spikes |
The rule: any alert that requires an automated response must go to a webhook. Email and Slack are for humans. Automated model downgrades and hard blocks must not wait for a human to read their inbox.
Alert Configuration Checklist
For each AI feature in production:
- [ ] 70% threshold → Slack/email to engineering team
- [ ] 90% threshold → Webhook that auto-downgrades to cheaper model
- [ ] 100% threshold → Hard block with graceful user error
- [ ] Global account alert at 80% → Safety net for unconfigured features
- [ ] Daily cost summary email → CTO/engineering lead
- [ ] Weekly trend alert → If week-over-week cost growth >30%, alert
Frequently Asked Questions
What should I do when a budget alert fires?
At 70%: investigate which feature or tenant is driving the spend. No action required yet. At 90%: your webhook should already have auto-downgraded the model. Verify the downgrade took effect. At 100%: investigate root cause, fix the underlying issue (runaway loop, prompt bloat, unexpected traffic spike), then restore service.
How do I set the right budget threshold?
Start with your average monthly spend × 1.3 as your monthly cap. Set the 70% alert at 0.7 × cap, the 90% alert at 0.9 × cap, and the hard block at 1.0 × cap. Review and adjust quarterly as your usage patterns stabilize. For new features with no history, set the cap at 2× your estimated monthly cost to allow headroom.
What's the cheapest fallback model for automatic downgrade?
DeepSeek V4-Flash ($0.14/M input) or GPT-4o-mini ($0.15/M input) for most text tasks. Configure your automatic downgrade to route to whichever is already in your stack. The downgrade should be transparent to users — they may get slightly shorter or less nuanced responses, but the feature keeps working.
Can I configure alerts per tenant rather than globally?
Yes, but it requires per-tenant budget tracking (Redis counters or database rows). Set a per-tenant monthly budget based on their plan tier, and configure threshold checks against that per-tenant budget rather than a global total. See Multi-Tenant LLM Cost Isolation for the implementation pattern.
The Bottom Line
Budget alerts are insurance. You pay a small implementation cost to prevent a large unexpected bill.
The 10-minute version: turn on provider-native email alerts right now. The 30-minute version: configure a proxy layer with webhook alerts and an auto-downgrade at 90%. The 2-hour version: build custom Redis counters with per-tenant and per-feature thresholds.
Start with the 10-minute version today. The alternative is finding out about cost spikes on invoice day.
Tokonomics gives you all three tiers out of the box: email alerts, webhook integration, per-feature and per-tenant thresholds, and automatic model downgrade at configurable budget percentages — no code required.
Read next: How to Set Hard Spending Caps | SaaS AI Features Cost Guide
About the authors: Written by the engineers behind Tokonomics. About → | Contact us →