AWS Error: ThrottlingException — API Rate Limit Exceeded
import boto3
from botocore.exceptions import ClientError
ddb = boto3.client('dynamodb')
try:
ddb.describe_table(TableName='Orders')
except ClientError as e:
# e.response['Error']['Code'] == 'ThrottlingException'
# e.response['ResponseMetadata']['HTTPStatusCode'] == 400
# e.response['Error']['Message'] == 'Rate exceeded'
if e.response['Error']['Code'] == 'ThrottlingException':
... # back off and retry
ThrottlingException is the AWS-wide signal that you’re calling an API faster than your account is allowed to. Despite returning HTTP 400 (a quirk of AWS history), it functions exactly like a 429 — back off, retry, and consider whether your access pattern is the real problem. The default SDK retry handles transient bursts; sustained throttling means architectural change is needed.
A useful mental model: every AWS API has a token bucket per account per region. Each call consumes a token; tokens refill at the sustained rate. A burst empties the bucket; once empty, every call throttles until tokens refill. Your job is to keep average consumption well under the refill rate and your bursts under the bucket size.
Why this happens
- Bursty calls exceeding the token bucket refill rate. AWS APIs use token-bucket rate limiting. The bucket has a max size (burst) and refills at a steady rate (sustained). A loop that calls `DescribeTable` 200 times in 50ms drains the burst bucket and starts throttling on call 51.
- Account-wide quota, not per-resource. Most AWS service quotas (e.g., DescribeInstances at 100 RPS) apply to the entire account in a region, not per IAM user or per resource. Multiple Lambda functions, EC2 instances, or scripts sharing one account share one quota.
- Wrong API for the workload. Using `DescribeTable` in a hot path instead of caching it; using `GetItem` in a loop instead of `BatchGetItem`; calling `ListObjects` on every request instead of S3 inventory. Switching to a paginated, batched, or cached pattern often eliminates throttling.
- Cross-region or cross-service amplification. A single user request that fans out into N service calls (DDB + S3 + KMS + STS) multiplies your QPS by N. The throttled service is rarely the bottleneck — the upstream caller fan-out is.
- SDK retry storms after a brief outage. When a service hiccups, every SDK client retries simultaneously. Without jitter, the retries align and overwhelm the recovering service, turning a 30-second blip into a 5-minute throttling cascade.
How to fix it
Fixes are ordered by likelihood. Start with the first one that matches your context.
1. Use SDK adaptive retry mode
Modern AWS SDKs ship with `adaptive` retry mode — it tracks throttling rate and slows the client down dynamically. It's not the default in older SDK versions; turn it on explicitly for any high-throughput client.
import boto3
from botocore.config import Config
config = Config(
retries={
'max_attempts': 10,
'mode': 'adaptive', # token-bucket aware
},
)
ddb = boto3.client('dynamodb', config=config)
2. Batch where the API supports it
Replace single-item calls with batch calls. `BatchGetItem` reads up to 100 items per call. `BatchWriteItem` writes up to 25. SQS `SendMessageBatch` sends up to 10. CloudWatch `PutMetricData` accepts 1,000 metrics per call. The change usually cuts your QPS by 10-100x.
keys = [{'OrderId': {'S': oid}} for oid in order_ids]
# split into chunks of 100 for BatchGetItem
for chunk in (keys[i:i+100] for i in range(0, len(keys), 100)):
resp = ddb.batch_get_item(
RequestItems={'Orders': {'Keys': chunk}}
)
3. Cache idempotent describe-style calls
`DescribeTable`, `DescribeInstances`, `GetCallerIdentity`, `ListBuckets` rarely change. Cache them in-memory for 5-60 minutes. A long-running Lambda calling `DescribeTable` once per invocation can cut its API quota usage by 99%.
4. Request a service quota increase
Service Quotas console → find the quota → request increase. Approval is usually within hours for routine bumps. Some hard limits (e.g., STS GetSessionToken) cannot be raised — for those, the fix is architectural (cache credentials, switch to `AssumeRoleWithWebIdentity`, etc.).
5. Add jitter to your own retries
If you've wrapped the SDK with custom retry logic, make sure you have full jitter (random sleep between 0 and the backoff window) — not "exponential backoff" alone. Without jitter, all your clients retry in lockstep.
import random, time
def with_jitter(call, max_attempts=6):
for attempt in range(max_attempts):
try:
return call()
except ClientError as e:
if e.response['Error']['Code'] != 'ThrottlingException':
raise
if attempt == max_attempts - 1:
raise
backoff = min(20, 2 ** attempt)
time.sleep(random.uniform(0, backoff))
Detection and monitoring in production
Most AWS services emit CloudWatch metrics for throttled requests (e.g., DynamoDB `ThrottledRequests`, Lambda `Throttles`, API Gateway `4XXError`). Alarm on a sustained throttle rate over 1% of total requests in a 5-minute window. Cross-reference with Service Quotas usage metrics to know which limit you're hitting.
Related errors
- awsAccessDeniedExceptionThe IAM principal (user, role, or assumed role) making the request does not have an `Allow` statement for the action and resource being called, or an explicit `Deny` somewhere in the evaluation chain blocks it.
- awsNoSuchBucketThe S3 bucket name in your request does not exist in the region your client is calling. Either the name is misspelled, the bucket was deleted, or your client is pointed at the wrong region for a bucket that exists elsewhere.
- openairate_limit_exceededYour account has exceeded its per-minute request (RPM) or per-minute token (TPM) limit for the model you're calling. Limits are tier-based and per-model.
- github403_rate_limitYou exceeded GitHub's primary REST API rate limit — 60 requests/hour for unauthenticated calls, 5,000/hour for personal access tokens, or 15,000/hour for GitHub App installations. The response is HTTP 403 with `X-RateLimit-Remaining: 0`.
Frequently asked questions
Why does ThrottlingException return HTTP 400 instead of 429? +
What's the difference between ThrottlingException and ProvisionedThroughputExceededException? +
Does the SDK automatically retry ThrottlingException? +
I batched my calls but still get throttled. What now? +
Can I use API Gateway in front of AWS APIs to absorb throttling? +
Why does my Lambda throttle on STS GetCallerIdentity? +
Does ThrottlingException count toward my AWS bill? +
How do I see my current AWS API quota usage? +
When to escalate to AWS support
Open an AWS Support case if (a) Service Quotas console shows headroom but you're still throttled (likely a hidden hot-partition limit), (b) a region-wide service event is masquerading as throttling, or (c) you need an emergency quota raise mid-incident. Provide the AWS request ID from the failing response — it lets support trace the exact bucket that throttled you.
Read more: /guide/handling-rate-limits/