Docs/Core concepts/Audio formats

Audio formats

The service auto-resamples anything reasonable. The only hard limit is 100 MB per batch upload; streaming clients should send 16 kHz PCM for the lowest latency.

Batch uploads

/api/v1/transcribe happily accepts:

FormatNotesSupport

WAV8–48 kHz PCM, 16/24-bit, mono or stereo✓ Recommended

FLAC8–48 kHz, any bit depth✓

MP332 kbps and up✓

OGG / OpusAny sample rate Opus supports✓

M4A / AAC8–48 kHz✓

WebM (Opus)From browser MediaRecorder✓

Streaming PCM

The WebSocket endpoint expects raw PCM frames after the initial config message. Mono, 16 kHz, pcm_s16le is what the model trains on — anything else gets resampled and that adds a few milliseconds.

Chunks of 20–40 ms work best (640–1280 samples). Don't buffer more than a second on the client.

Quality tips

Background noise hurts accuracy more than bitrate. A 64 kbps Opus file recorded in a quiet room outperforms a 320 kbps MP3 from a busy café. If you have any pre-processing budget on the client, a mild noise gate goes further than dynamic-range compression.

Previous← Languages NextReference →