Docs/Core concepts/Audio formats
Audio formats
The service auto-resamples anything reasonable. The only hard limit is 100 MB per batch upload; streaming clients should send 16 kHz PCM for the lowest latency.
Batch uploads
/api/v1/transcribe happily accepts:
FormatNotesSupport
WAV8–48 kHz PCM, 16/24-bit, mono or stereo✓ Recommended
FLAC8–48 kHz, any bit depth✓
MP332 kbps and up✓
OGG / OpusAny sample rate Opus supports✓
M4A / AAC8–48 kHz✓
WebM (Opus)From browser MediaRecorder✓
Streaming PCM
The WebSocket endpoint expects raw PCM frames after the initial config message. Mono, 16 kHz, pcm_s16le is what the model trains on — anything else gets resampled and that adds a few milliseconds.
Chunks of 20–40 ms work best (640–1280 samples). Don't buffer more than a second on the client.
Quality tips
Background noise hurts accuracy more than bitrate. A 64 kbps Opus file recorded in a quiet room outperforms a 320 kbps MP3 from a busy café. If you have any pre-processing budget on the client, a mild noise gate goes further than dynamic-range compression.