OpenAI Error: insufficient_quota — Quota Exhausted
import openai
try:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
except openai.RateLimitError as e:
# e.status_code == 429
# e.code == 'insufficient_quota'
# e.type == 'insufficient_quota'
# e.message includes 'You exceeded your current quota, please check your plan and billing details.'
if e.code == 'insufficient_quota':
alert_billing_team()
raise # don't retry
insufficient_quota is OpenAI’s billing-out signal dressed up as HTTP 429. Unlike rate_limit_exceeded (which means “slow down”), insufficient_quota means “you’ve run out of money, retrying won’t help, fix billing.” The two errors share a status code but require completely different handling — and conflating them is one of the most common bugs in production OpenAI integrations.
The fix is rarely in code. It’s in your billing setup: a card on file, auto-recharge enabled, soft/hard limits set high enough for production, and an alerting system that pages you when balance drops. Get those right and you’ll never see insufficient_quota outside a deliberate spend cap. Get them wrong and you’ll have a silent outage every time the prepaid balance ticks to zero.
Why this happens
- Free trial credit exhausted or expired. New OpenAI accounts get a small amount of free credit ($5 historically, less now) that expires after 3 months. Once spent or expired, every request fails `insufficient_quota` until you add a payment method. The dashboard shows the remaining trial balance under Billing → Overview.
- Monthly hard usage limit reached. Even on paid plans, OpenAI enforces a soft and hard usage cap per month (set in Billing → Limits). Hitting the hard limit blocks all further requests until the next billing cycle or you raise the cap. The default for new paid orgs is often $120/mo — easy to hit on production traffic.
- Payment method failed and credit isn't auto-topping up. OpenAI's auto-recharge depends on a working card. If a charge fails (expired card, decline), auto-recharge stops and your balance drops to zero. The org continues running until the credit hits zero, then every call fails `insufficient_quota` until you fix billing.
- Org never added a payment method. Some teams sign up, get the API key working with trial credit, then forget to add a card. Once trial credit runs out, the API stops cold. Common with side projects that ramp up unexpectedly when they get attention.
- Project-level spend cap reached (Projects feature). If your org uses Projects, each project can have its own monthly spend cap. Hitting a project cap returns `insufficient_quota` even though the org has plenty of headroom. Look at Settings → Projects → [your project] → Limits.
How to fix it
Fixes are ordered by likelihood. Start with the first one that matches your context.
1. Add or top up your payment method, then raise the usage limit
Go to platform.openai.com/account/billing/overview. Add a card if missing. If a card is attached, top up your prepaid balance manually (Billing → Add to credit balance) and enable auto-recharge with a sensible threshold. Then go to Limits and raise your monthly hard cap to a level that won't cut off production.
2. Fail loudly on `insufficient_quota` — never silently retry
`insufficient_quota` is a billing problem, not a transient one. Retrying with backoff wastes capacity and floods your logs. Catch the error, page the billing/ops on-call, and stop retrying. Distinguish it explicitly from `rate_limit_exceeded`.
import openai
from openai import RateLimitError
def safe_call(messages, model="gpt-4o"):
try:
return openai.chat.completions.create(model=model, messages=messages)
except RateLimitError as e:
if e.code == 'insufficient_quota':
# Page billing — do not retry
notify_pagerduty('OpenAI quota exhausted', severity='critical')
raise QuotaExhaustedError() from e
# Real rate-limit — retry with backoff handled elsewhere
raise
3. Set up balance and usage alerts before you run out
In Billing → Limits, set a soft usage limit (warning email) at 70% and a hard limit at 100% of your monthly budget. In Billing → Auto-recharge, set "Recharge when balance drops below" at $20-50 so you never hit zero unexpectedly. Run a daily script that checks the credit balance via the dashboard and alerts before depletion.
# OpenAI doesn't yet expose balance via API.
# Workaround: track your spend client-side from response.usage and alert.
from collections import defaultdict
spend_today = defaultdict(float)
PRICE = {"gpt-4o": (2.50/1e6, 10.00/1e6)} # input, output per token
def track(response, model):
in_p, out_p = PRICE[model]
cost = response.usage.prompt_tokens * in_p + response.usage.completion_tokens * out_p
spend_today[model] += cost
if sum(spend_today.values()) > DAILY_BUDGET:
alert("Daily OpenAI budget exceeded")
4. Move to a paid tier if you're still on free trial
Free trial credit isn't designed for production traffic. As soon as the project moves beyond proof-of-concept, add a payment method and start spending real money — that auto-upgrades you from free tier to tier 1 immediately, and to higher tiers after $50+ paid + 7 days. Tiered limits also reduce `rate_limit_exceeded` errors.
5. Check project-level limits if your org uses Projects
Settings → Projects → [project] → Limits shows per-project spend caps and rate limits. Hitting a project cap returns `insufficient_quota` even with org headroom. Either raise the project cap or move the workload to a project with available budget.
Detection and monitoring in production
Tag `insufficient_quota` errors distinctly from `rate_limit_exceeded` in your monitoring. The former should fire a critical alert (production is down), the latter is informational. A single `insufficient_quota` error means every subsequent call will also fail — don't dedupe to a single alert per hour, page immediately. Track daily token spend client-side from `response.usage` since OpenAI doesn't expose live balance via API.
Related errors
- openairate_limit_exceededYour account has exceeded its per-minute request (RPM) or per-minute token (TPM) limit for the model you're calling. Limits are tier-based and per-model.
- openaicontext_length_exceededThe total tokens (prompt + max_tokens for completion) exceeds the model's context window. For example, sending 130,000 input tokens to `gpt-4o` (128k window) or asking for 5,000 completion tokens when the prompt is already 125k.
- openaimodel_not_foundYou requested a model name that either doesn't exist (typo, deprecated, renamed) or that your organisation doesn't have access to (tier-gated, geography-restricted, deprecated for new orgs).
- anthropicrate_limit_errorYou exceeded one of Anthropic's per-minute caps for the model and tier — RPM (requests/min), ITPM (input tokens/min), or OTPM (output tokens/min). Anthropic enforces all three independently and you can hit any one without breaching the others.
- anthropicauthentication_errorThe `x-api-key` header you sent doesn't match an active Anthropic API key — usually because the env var isn't loaded, the key was rotated or revoked, you're using a Workspace key in the wrong workspace, or a wrong-provider key (Bedrock or Vertex) was sent to the direct Anthropic API.
Frequently asked questions
Why is `insufficient_quota` returned as HTTP 429 when it's a billing problem? +
How do I tell `insufficient_quota` from `rate_limit_exceeded` in code? +
Will OpenAI grant emergency credit if production is down? +
My org has $500 of credit but I still get `insufficient_quota`. What's wrong? +
Does OpenAI auto-recharge after a card failure? +
If I raise my monthly limit mid-month, do I get charged immediately? +
Can I see how much credit I have left from the API? +
Is `insufficient_quota` ever transient? +
When to escalate to OpenAI support
Open a billing support ticket only if (a) you've added a payment method, the dashboard shows positive credit, and you still get `insufficient_quota`, or (b) auto-recharge has stopped working despite a valid card. For 'I want a higher limit' or 'my trial ran out', the answer is in the Billing UI — support can't shortcut it. For paid enterprise contracts, your account manager can adjust limits faster.
Read more: /guide/handling-rate-limits/