Rate Limits

To ensure the reliability and stability of the Addis AI platform for all users, we enforce limits on the number of requests you can make over specific periods.

Tier Quotas

Limits are applied based on your organization's billing plan.

Feature	Free / Sandbox	Pro / Team	Enterprise
RPM (Requests Per Minute)	60	500	Custom
RPD (Requests Per Day)	1,000	Unlimited	Unlimited
TPM (Tokens Per Minute)	40,000	250,000	Custom
Concurrency	3 Requests	50 Requests	Custom

Hitting Limits?

If you consistently hit these limits, please contact Sales to discuss an Enterprise plan with dedicated throughput.

Model Constraints

Apart from rate limits, each model has technical constraints regarding input size and duration.

Text Generation

Context Window8,192 Tokens

Max Output4,096 Tokens

Audio (TTS & STT)

Max Audio Size10 MB

Max Duration60 Seconds

Vision

Max Image Size10 MB

FormatsJPG, PNG, WEBP

Documents

Max PDF Size10 MB

Page Limit~20 Pages

Response Headers

Every API response includes HTTP headers that tell you your current status.

Header	Description
`x-ratelimit-limit-requests`	The maximum number of requests allowed in the current window.
`x-ratelimit-remaining-requests`	The number of requests remaining in the current window.
`x-ratelimit-reset-requests`	The time (in seconds) until the window resets.

Handling Rate Limits (429)

If you exceed a limit, the API will return a 429 Too Many Requests status. Your application should handle this gracefully using Exponential Backoff.

Do not retry immediately in a tight loop. Wait, then retry with increasing delays.

async function fetchWithBackoff(url, options, retries = 3, delay = 1000) {
  try {
    const response = await fetch(url, options);

    // If rate limited, wait and retry
    if (response.status === 429 && retries > 0) {
      const resetTime = response.headers.get('x-ratelimit-reset-requests') || 1;
      const waitTime = Math.max(delay, resetTime * 1000);
      
      console.warn(`Rate limited. Retrying in ${waitTime}ms...`);
      await new Promise(resolve => setTimeout(resolve, waitTime));
      
      // Retry with double the delay (Exponential Backoff)
      return fetchWithBackoff(url, options, retries - 1, delay * 2);
    }

    return response;
  } catch (error) {
    throw error;
  }
}

import time
import requests

def request_with_backoff(url, headers, json_data, retries=3, delay=1):
    for i in range(retries + 1):
        response = requests.post(url, headers=headers, json=json_data)
        
        if response.status_code == 429:
            if i == retries:
                return response # Give up
            
            # Use header or default delay
            wait_time = int(response.headers.get('x-ratelimit-reset-requests', delay))
            print(f"Rate limit hit. Retrying in {wait_time}s...")
            
            time.sleep(wait_time)
            delay *= 2 # Exponential backoff
            continue
            
        return response