Streaming vs batch
The same model serves both modes. Pick batch for a finished file, streaming for live audio that the user needs to see immediately.
Batch (REST)
A single POST to /api/v1/transcribe. You hand the server the whole clip, the server hands you back the whole transcript when it's done. Latency is roughly 0.05 × audio length plus a fixed ~150 ms.
Use when: you have a finished recording on disk, you don't need partial results, or your client lives behind a strict firewall.
Streaming (WebSocket)
A persistent connection on /api/v1/stream where you push raw PCM and receive incremental partial and final events. Partial latency is sub-200 ms.
Use when: live captions, real-time interpretation, voice agents, long sessions you can't fit in a single upload.
Which to pick
As a rule of thumb: if a human is waiting for the transcript in real time, pick streaming. If a pipeline is processing a file, pick batch. Pricing is identical — both bill per minute of audio.