429 Too Many Requests — Rate Limiting Hits
लक्षण
- API returns HTTP 429 with a `Retry-After` header indicating wait time
- Response body contains rate limit details: `{"error":"rate_limit_exceeded","remaining":0,"reset":1709251200}`
- Requests succeed normally, then suddenly all fail for a rolling time window
- Different API keys or OAuth tokens hit limits independently at different rates
- Logs show bursts of 429s followed by recovery once the window resets
- Response body contains rate limit details: `{"error":"rate_limit_exceeded","remaining":0,"reset":1709251200}`
- Requests succeed normally, then suddenly all fail for a rolling time window
- Different API keys or OAuth tokens hit limits independently at different rates
- Logs show bursts of 429s followed by recovery once the window resets
मूल कारण
- Sending requests in a tight loop without any delay between calls
- Sharing a single API key across multiple workers or processes without coordination
- Retry logic that immediately retries on failure, compounding the rate limit violation
- Not reading or respecting the `Retry-After` or `X-RateLimit-Reset` response headers
- Burst traffic from batch jobs scheduled to run simultaneously (e.g., top of the hour)
निदान
1. **Inspect the response headers** to understand the rate limit policy:
```bash
curl -i -X GET https://api.example.com/endpoint \
-H 'Authorization: Bearer YOUR_TOKEN'
# Look for: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After
```
2. **Decode the reset timestamp** if it is epoch-based:
```bash
date -d @1709251200 # Linux
date -r 1709251200 # macOS
```
3. **Calculate your actual request rate** by counting logs over a rolling window:
```bash
grep 'POST /api/' access.log | awk '{print $4}' | cut -c1-17 | sort | uniq -c
```
4. **Check if multiple processes share the same key** — grep your process list or config:
```bash
grep -r 'API_KEY' .env* config/ | head -20
ps aux | grep worker
```
5. **Simulate the limit** with a quick burst test to confirm the threshold:
```bash
for i in $(seq 1 20); do
curl -s -o /dev/null -w '%{http_code}\n' https://api.example.com/endpoint \
-H 'Authorization: Bearer YOUR_TOKEN'
done
```
```bash
curl -i -X GET https://api.example.com/endpoint \
-H 'Authorization: Bearer YOUR_TOKEN'
# Look for: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After
```
2. **Decode the reset timestamp** if it is epoch-based:
```bash
date -d @1709251200 # Linux
date -r 1709251200 # macOS
```
3. **Calculate your actual request rate** by counting logs over a rolling window:
```bash
grep 'POST /api/' access.log | awk '{print $4}' | cut -c1-17 | sort | uniq -c
```
4. **Check if multiple processes share the same key** — grep your process list or config:
```bash
grep -r 'API_KEY' .env* config/ | head -20
ps aux | grep worker
```
5. **Simulate the limit** with a quick burst test to confirm the threshold:
```bash
for i in $(seq 1 20); do
curl -s -o /dev/null -w '%{http_code}\n' https://api.example.com/endpoint \
-H 'Authorization: Bearer YOUR_TOKEN'
done
```
समाधान
**1. Respect the `Retry-After` header with exponential backoff:**
```python
import time, random, httpx
def call_with_retry(url: str, headers: dict, max_retries: int = 5) -> httpx.Response:
for attempt in range(max_retries):
resp = httpx.get(url, headers=headers)
if resp.status_code == 429:
retry_after = int(resp.headers.get('Retry-After', 2 ** attempt))
jitter = random.uniform(0, 1)
time.sleep(retry_after + jitter)
continue
resp.raise_for_status()
return resp
raise RuntimeError('Max retries exceeded')
```
**2. Add a token bucket or leaky bucket in front of outbound calls:**
```python
import asyncio
from asyncio import Semaphore
# Allow at most 10 concurrent requests
sem = Semaphore(10)
async def safe_fetch(client, url):
async with sem:
return await client.get(url)
```
**3. Distribute load across multiple API keys (if the provider allows):**
```python
import itertools
keys = ['key_a', 'key_b', 'key_c']
key_pool = itertools.cycle(keys)
headers = {'Authorization': f'Bearer {next(key_pool)}'}
```
**4. Cache responses so repeated calls do not hit the API:**
```python
from django.core.cache import cache
def get_user_data(user_id: int) -> dict:
key = f'api:user:{user_id}'
cached = cache.get(key)
if cached:
return cached
data = call_api(user_id)
cache.set(key, data, timeout=300)
return data
```
```python
import time, random, httpx
def call_with_retry(url: str, headers: dict, max_retries: int = 5) -> httpx.Response:
for attempt in range(max_retries):
resp = httpx.get(url, headers=headers)
if resp.status_code == 429:
retry_after = int(resp.headers.get('Retry-After', 2 ** attempt))
jitter = random.uniform(0, 1)
time.sleep(retry_after + jitter)
continue
resp.raise_for_status()
return resp
raise RuntimeError('Max retries exceeded')
```
**2. Add a token bucket or leaky bucket in front of outbound calls:**
```python
import asyncio
from asyncio import Semaphore
# Allow at most 10 concurrent requests
sem = Semaphore(10)
async def safe_fetch(client, url):
async with sem:
return await client.get(url)
```
**3. Distribute load across multiple API keys (if the provider allows):**
```python
import itertools
keys = ['key_a', 'key_b', 'key_c']
key_pool = itertools.cycle(keys)
headers = {'Authorization': f'Bearer {next(key_pool)}'}
```
**4. Cache responses so repeated calls do not hit the API:**
```python
from django.core.cache import cache
def get_user_data(user_id: int) -> dict:
key = f'api:user:{user_id}'
cached = cache.get(key)
if cached:
return cached
data = call_api(user_id)
cache.set(key, data, timeout=300)
return data
```
रोकथाम
- **Track your usage** against the documented quota before hitting limits; use `X-RateLimit-Remaining` to self-throttle before reaching zero
- **Stagger batch jobs** using randomised start times or a queue (Celery, django-tasks) rather than cron jobs that fire simultaneously
- **Cache aggressively** for read-heavy endpoints — a 60-second cache can reduce outbound calls by 99% for popular resources
- **Set up alerts** when `X-RateLimit-Remaining` drops below 20% so you can investigate before requests start failing
- **Stagger batch jobs** using randomised start times or a queue (Celery, django-tasks) rather than cron jobs that fire simultaneously
- **Cache aggressively** for read-heavy endpoints — a 60-second cache can reduce outbound calls by 99% for popular resources
- **Set up alerts** when `X-RateLimit-Remaining` drops below 20% so you can investigate before requests start failing