Rate Limits
API usage quotas, context windows, and system constraints.
To ensure the reliability and stability of the Addis AI platform for all users, we enforce limits on the number of requests you can make over specific periods.
Tier Quotas
Limits are applied based on your organization's billing plan.
| Feature | Free / Sandbox | Pro / Team | Enterprise |
|---|---|---|---|
| RPM (Requests Per Minute) | 60 | 500 | Custom |
| RPD (Requests Per Day) | 1,000 | Unlimited | Unlimited |
| TPM (Tokens Per Minute) | 40,000 | 250,000 | Custom |
| Concurrency | 3 Requests | 50 Requests | Custom |
Hitting Limits?
If you consistently hit these limits, please contact Sales to discuss an Enterprise plan with dedicated throughput.
Model Constraints
Apart from rate limits, each model has technical constraints regarding input size and duration.
Text Generation
Audio (TTS & STT)
Vision
Documents
Response Headers
Every API response includes HTTP headers that tell you your current status.
| Header | Description |
|---|---|
x-ratelimit-limit-requests | The maximum number of requests allowed in the current window. |
x-ratelimit-remaining-requests | The number of requests remaining in the current window. |
x-ratelimit-reset-requests | The time (in seconds) until the window resets. |
Handling Rate Limits (429)
If you exceed a limit, the API will return a 429 Too Many Requests status. Your application should handle this gracefully using Exponential Backoff.
Do not retry immediately in a tight loop. Wait, then retry with increasing delays.
async function fetchWithBackoff(url, options, retries = 3, delay = 1000) {
try {
const response = await fetch(url, options);
// If rate limited, wait and retry
if (response.status === 429 && retries > 0) {
const resetTime = response.headers.get('x-ratelimit-reset-requests') || 1;
const waitTime = Math.max(delay, resetTime * 1000);
console.warn(`Rate limited. Retrying in ${waitTime}ms...`);
await new Promise(resolve => setTimeout(resolve, waitTime));
// Retry with double the delay (Exponential Backoff)
return fetchWithBackoff(url, options, retries - 1, delay * 2);
}
return response;
} catch (error) {
throw error;
}
}import time
import requests
def request_with_backoff(url, headers, json_data, retries=3, delay=1):
for i in range(retries + 1):
response = requests.post(url, headers=headers, json=json_data)
if response.status_code == 429:
if i == retries:
return response # Give up
# Use header or default delay
wait_time = int(response.headers.get('x-ratelimit-reset-requests', delay))
print(f"Rate limit hit. Retrying in {wait_time}s...")
time.sleep(wait_time)
delay *= 2 # Exponential backoff
continue
return response