Addis AI

Realtime API

Low-latency, bidirectional voice conversations via WebSockets.

The Realtime API enables fluid, natural voice conversations. It uses WebSockets to stream audio bi-directionally, achieving response times under 300ms.

It is powered by አሌፍ-1.2-realtime-audio, allowing for natural interruptions and back-channeling.

Endpoint

WSSwss://api.addisassistant.com/v1/realtime

Industry Use Cases

Our Realtime model is already deployed across key Ethiopian sectors. You can test these specific Agents in our Playground.

Telecom Assistant

Automated customer support for balance checks, package purchasing, and troubleshooting in Amharic.

Ride Hailing

Voice-first driver negotiation and location finding. "Where are you?" "I'm at Bole near the bank."

Farmer's Assistant

Agricultural advisory for weather updates, crop disease diagnosis, and market prices in local dialects.


Audio Format Requirements

Critical: Audio Encoding

The API expects 16-bit Signed Integer PCM (PCM16) audio.

Developer Warning: The standard browser AudioContext returns 32-bit Float data (-1.0 to 1.0) by default. If you send this directly, it will sound like static noise. You must convert Float32 to Int16 before sending it over the WebSocket.

PropertyValue
EncodingPCM 16-bit (Little Endian)
Sample Rate24,000Hz (Recommended) or 16,000Hz
ChannelsMono (1 Channel)

Integration Guide

Browser Implementation (JavaScript)

This example establishes a connection, captures microphone audio, converts it from Float32 to PCM16, and handles the audio stream.

// Configuration
const WS_URL = "wss://api.addisassistant.com/v1/realtime";
const API_KEY = "sk_YOUR_KEY";

let socket;
let audioContext;
let processor;

async function startRealtime() {
  // 1. Initialize WebSocket
  socket = new WebSocket(WS_URL);
  socket.binaryType = "arraybuffer";

  socket.onopen = () => {
    console.log("Connected to Addis AI Realtime");
    
    // Send initial config
    socket.send(JSON.stringify({
      type: "session.update",
      session: {
        api_key: API_KEY,
        language: "am"
      }
    }));
  };

  socket.onmessage = (event) => {
    // Handle incoming binary audio or JSON events
    if (event.data instanceof ArrayBuffer) {
        playAudioChunk(event.data);
    } else {
        console.log("Event:", JSON.parse(event.data));
    }
  };

  // 2. Initialize Microphone
  audioContext = new (window.AudioContext || window.webkitAudioContext)({
    sampleRate: 24000, 
  });

  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const source = audioContext.createMediaStreamSource(stream);
  
  // Create a ScriptProcessor to access raw audio data
  processor = audioContext.createScriptProcessor(4096, 1, 1);

  processor.onaudioprocess = (e) => {
    if (socket.readyState === WebSocket.OPEN) {
      const inputData = e.inputBuffer.getChannelData(0); // This is Float32
      
      // CRITICAL: Convert Float32 to Int16 (PCM)
      const pcm16Data = floatTo16BitPCM(inputData);
      
      socket.send(pcm16Data.buffer);
    }
  };

  source.connect(processor);
  processor.connect(audioContext.destination);
}

// --- Helper: Convert Float32 to Int16 ---
function floatTo16BitPCM(float32Array) {
  const int16Array = new Int16Array(float32Array.length);
  for (let i = 0; i < float32Array.length; i++) {
    const s = Math.max(-1, Math.min(1, float32Array[i]));
    int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
  }
  return int16Array;
}
import asyncio
import websockets
import pyaudio
import json

# Audio Config
FORMAT = pyaudio.paInt16 # Native 16-bit
CHANNELS = 1
RATE = 24000
CHUNK = 1024

async def realtime_chat():
    uri = "wss://api.addisassistant.com/v1/realtime"
    
    async with websockets.connect(uri) as websocket:
        # 1. authenticate
        await websocket.send(json.dumps({
            "type": "session.update",
            "session": { "api_key": "sk_YOUR_KEY", "language": "am" }
        }))

        # 2. Setup Microphone
        p = pyaudio.PyAudio()
        stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, 
                        input=True, frames_per_buffer=CHUNK)

        print("Listening... (Press Ctrl+C to stop)")

        try:
            while True:
                # Read raw PCM16 bytes from mic
                data = stream.read(CHUNK)
                await websocket.send(data)
                
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()

asyncio.run(realtime_chat())

Protocol & Events

The WebSocket handles bidirectional traffic.

Client to Server

The client sends Binary Frames containing raw PCM16 audio chunks.

Server to Client

The server sends Binary Frames (Audio Response) and Text Frames (JSON Events).

AI Audio Response

The server streams the AI's voice as binary PCM16 chunks. You should buffer and play these immediately.

Turn Complete & Usage

When the AI finishes speaking, the server sends a completion event with billing details.

{
  "serverContent": {
    "turnComplete": true
  },
  "usageMetadata": {
    "totalBilledAudioDurationSeconds": 5.2
  }
}

Warning Messages

Sent if there are non-critical issues, such as billing thresholds.

{
  "type": "warning",
  "message": "Your wallet balance is low. Please top up to avoid service interruption."
}

Error Messages

Sent if a critical failure occurs.

{
  "error": {
    "message": "AI service error",
    "status": 500,
    "timestamp": "2025-07-15T10:00:00.000Z"
  }
}

Capabilities Roadmap

We are rapidly expanding the Realtime engine.

VAD (Voice Activity Detection)

The model currently uses server-side VAD to determine when the user has stopped speaking.

Knowledge Base

Coming Soon

Soon you will be able to attach PDFs or Text documents to the Realtime session. This will allow the voice assistant to answer questions specifically based on your uploaded data (RAG).

On this page