Varmirdocs
Docs/Core concepts/Audio formats

Audio formats

The service auto-resamples anything reasonable. The only hard limit is 100 MB per batch upload; streaming clients should send 16 kHz PCM for the lowest latency.

Batch uploads

/api/v1/transcribe happily accepts:

FormatNotesSupport
WAV8–48 kHz PCM, 16/24-bit, mono or stereo✓ Recommended
FLAC8–48 kHz, any bit depth
MP332 kbps and up
OGG / OpusAny sample rate Opus supports
M4A / AAC8–48 kHz
WebM (Opus)From browser MediaRecorder

Streaming PCM

The WebSocket endpoint expects raw PCM frames after the initial config message. Mono, 16 kHz, pcm_s16le is what the model trains on — anything else gets resampled and that adds a few milliseconds.

Chunks of 20–40 ms work best (640–1280 samples). Don't buffer more than a second on the client.

Quality tips

Background noise hurts accuracy more than bitrate. A 64 kbps Opus file recorded in a quiet room outperforms a 320 kbps MP3 from a busy café. If you have any pre-processing budget on the client, a mild noise gate goes further than dynamic-range compression.

Audio formats — Varmir docs