Speech to Text (Transcription)
Transcribe audio into text for Amharic and Afan Oromo.
Our Speech-to-Text (STT) technology is powered by specialized models trained to accurately recognize Ethiopian languages. Unlike generic transcription services, our models are optimized for the specific phonemes, accents, and dialects of Amharic and Afan Oromo.
Endpoint
Usage Guide
This endpoint requires a multipart/form-data request. You must send the audio file binary along with a JSON string containing the configuration.
Transcribe a File
Upload an audio file to get the text transcription.
Note: 'request_data' must be a stringified JSON object
curl --location 'https://api.addisassistant.com/api/v2/stt' \
--header 'x-api-key: sk_YOUR_KEY' \
--form 'audio=@"/path/to/voice_note.wav"' \
--form 'request_data="{ \"language_code\": \"am\" }"'const formData = new FormData();
// 1. Append the file
// Assuming 'fileInput' is an HTML <input type="file">
formData.append("audio", fileInput.files[0]);
// 2. Append metadata as a stringified JSON
formData.append("request_data", JSON.stringify({
language_code: "am"
}));
const response = await fetch("https://api.addisassistant.com/api/v2/stt", {
method: "POST",
headers: {
"x-api-key": "sk_YOUR_KEY"
// Do NOT set Content-Type header manually for FormData
// The browser sets it automatically with the boundary
},
body: formData
});
const data = await response.json();
console.log("Transcription:", data.data.transcription);import requests
import json
url = "https://api.addisassistant.com/api/v2/stt"
headers = {"x-api-key": "sk_YOUR_KEY"}
# 1. Prepare Metadata
payload = {
'request_data': json.dumps({ "language_code": "am" })
}
# 2. Open File
files = [
('audio', ('voice.wav', open('/path/to/voice.wav', 'rb'), 'audio/wav'))
]
response = requests.post(url, headers=headers, data=payload, files=files)
print(response.json()['data']['transcription'])API Reference
Form Data Parameters
These fields are sent as multipart form data.
Prop
Type
Request Data Object
These parameters go inside the request_data JSON string.
Prop
Type
Response Schema
{
"status": "success",
"data": {
"transcription": "ሰላም እንኳን ደህና መጣችሁ",
"usage_metadata": {
"totalBilledDuration": "15s",
"requestId": "69b60667-0000-2a1e-b6d3-d4f547fe6724"
}
},
"confidence": 0.982
}Supported Formats
We support standard audio containers. For the fastest processing, we recommend WAV.
| Format | Content Types (MIME) |
|---|---|
| WAV | audio/wav, audio/x-wav, audio/wave |
| MP3 | audio/mpeg, audio/mp3 |
| M4A | audio/mp4, audio/x-m4a |
| WebM | audio/webm |
Best Practices
To ensure high accuracy (WER < 10%), follow these recording guidelines.
Audio Specs
Sample Rate: 16kHz or higher is recommended for clarity.
Channel: Mono is preferred. Stereo files are supported but mixed down before processing.
Environment
Noise: Background noise significantly degrades accuracy. Record in quiet environments.
Distance: Keep the speaker 10-30cm from the microphone for optimal volume levels.
Constraints
60 Seconds10 MBLimitations
Speakers: The model is optimized for single-speaker audio. Overlapping voices may result in skipped words.
Technical Terms: Rare technical jargon or code-switching (mixing English heavily) may have lower accuracy.