Realtime API

The Realtime API enables fluid, natural voice conversations. It uses WebSockets to stream audio bi-directionally, achieving response times under 300ms.

It is powered by አሌፍ-1.2-realtime-audio, allowing for natural interruptions and back-channeling.

Endpoint

WSSwss://api.addisassistant.com/v1/realtime

Industry Use Cases

Our Realtime model is already deployed across key Ethiopian sectors. You can test these specific Agents in our Playground.

Telecom Assistant

Automated customer support for balance checks, package purchasing, and troubleshooting in Amharic.

Ride Hailing

Voice-first driver negotiation and location finding. "Where are you?" "I'm at Bole near the bank."

Farmer's Assistant

Agricultural advisory for weather updates, crop disease diagnosis, and market prices in local dialects.

Audio Format Requirements

Critical: Audio Encoding

The API expects 16-bit Signed Integer PCM (PCM16) audio.

Developer Warning: The standard browser AudioContext returns 32-bit Float data (-1.0 to 1.0) by default. If you send this directly, it will sound like static noise. You must convert Float32 to Int16 before sending it over the WebSocket.

Property	Value
Encoding	PCM 16-bit (Little Endian)
Sample Rate	24,000Hz (Recommended) or 16,000Hz
Channels	Mono (1 Channel)

Integration Guide

Browser Implementation (JavaScript)

This example establishes a connection, captures microphone audio, converts it from Float32 to PCM16, and handles the audio stream.

// Configuration
const WS_URL = "wss://api.addisassistant.com/v1/realtime";
const API_KEY = "sk_YOUR_KEY";

let socket;
let audioContext;
let processor;

async function startRealtime() {
  // 1. Initialize WebSocket
  socket = new WebSocket(WS_URL);
  socket.binaryType = "arraybuffer";

  socket.onopen = () => {
    console.log("Connected to Addis AI Realtime");
    
    // Send initial config
    socket.send(JSON.stringify({
      type: "session.update",
      session: {
        api_key: API_KEY,
        language: "am"
      }
    }));
  };

  socket.onmessage = (event) => {
    // Handle incoming binary audio or JSON events
    if (event.data instanceof ArrayBuffer) {
        playAudioChunk(event.data);
    } else {
        console.log("Event:", JSON.parse(event.data));
    }
  };

  // 2. Initialize Microphone
  audioContext = new (window.AudioContext || window.webkitAudioContext)({
    sampleRate: 24000, 
  });

  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const source = audioContext.createMediaStreamSource(stream);
  
  // Create a ScriptProcessor to access raw audio data
  processor = audioContext.createScriptProcessor(4096, 1, 1);

  processor.onaudioprocess = (e) => {
    if (socket.readyState === WebSocket.OPEN) {
      const inputData = e.inputBuffer.getChannelData(0); // This is Float32
      
      // CRITICAL: Convert Float32 to Int16 (PCM)
      const pcm16Data = floatTo16BitPCM(inputData);
      
      socket.send(pcm16Data.buffer);
    }
  };

  source.connect(processor);
  processor.connect(audioContext.destination);
}

// --- Helper: Convert Float32 to Int16 ---
function floatTo16BitPCM(float32Array) {
  const int16Array = new Int16Array(float32Array.length);
  for (let i = 0; i < float32Array.length; i++) {
    const s = Math.max(-1, Math.min(1, float32Array[i]));
    int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
  }
  return int16Array;
}

import asyncio
import websockets
import pyaudio
import json

# Audio Config
FORMAT = pyaudio.paInt16 # Native 16-bit
CHANNELS = 1
RATE = 24000
CHUNK = 1024

async def realtime_chat():
    uri = "wss://api.addisassistant.com/v1/realtime"
    
    async with websockets.connect(uri) as websocket:
        # 1. authenticate
        await websocket.send(json.dumps({
            "type": "session.update",
            "session": { "api_key": "sk_YOUR_KEY", "language": "am" }
        }))

        # 2. Setup Microphone
        p = pyaudio.PyAudio()
        stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, 
                        input=True, frames_per_buffer=CHUNK)

        print("Listening... (Press Ctrl+C to stop)")

        try:
            while True:
                # Read raw PCM16 bytes from mic
                data = stream.read(CHUNK)
                await websocket.send(data)
                
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()

asyncio.run(realtime_chat())

{
  "serverContent": {
    "turnComplete": true
  },
  "usageMetadata": {
    "totalBilledAudioDurationSeconds": 5.2
  }
}

Warning Messages

Sent if there are non-critical issues, such as billing thresholds.

{
  "type": "warning",
  "message": "Your wallet balance is low. Please top up to avoid service interruption."
}

Error Messages

Sent if a critical failure occurs.

{
  "error": {
    "message": "AI service error",
    "status": 500,
    "timestamp": "2025-07-15T10:00:00.000Z"
  }
}

Capabilities Roadmap

We are rapidly expanding the Realtime engine.

VAD (Voice Activity Detection)

The model currently uses server-side VAD to determine when the user has stopped speaking.

Knowledge Base

Coming Soon

Soon you will be able to attach PDFs or Text documents to the Realtime session. This will allow the voice assistant to answer questions specifically based on your uploaded data (RAG).

Realtime API

Endpoint

Industry Use Cases

Audio Format Requirements

Integration Guide

Browser Implementation (JavaScript)

Protocol & Events

Client to Server

Server to Client

AI Audio Response

Turn Complete & Usage

Warning Messages

Error Messages

Capabilities Roadmap

VAD (Voice Activity Detection)

Knowledge Base

On this page