Authentication with Authorization header and DeepL-Auth-Key authentication scheme. Example: DeepL-Auth-Key <api-key>
The audio format for streaming, which specifies container, codec, and encoding parameters. See the table below for supported formats. If audio/auto is specified, the server will auto-detect the container and codec for all supported combinations, except PCM. That requires explicit encoding parameters. All formats need to be single channel audio.
| Content Type | Container | Codec |
|---|---|---|
audio/auto | Auto-detect: FLAC / Matroska / MPEG / Ogg / WebM | Auto-detect AAC / FLAC / MP3 / OPUS |
audio/flac | FLAC (flac) | FLAC |
audio/mpeg | MPEG (mp3/m4a) | MP3 |
audio/ogg | Ogg (ogg/oga) | Auto-detect FLAC / OPUS |
audio/webm | WebM (webm) | OPUS |
audio/x-matroska | Matroska (mkv/mka) | Auto-detect: AAC / FLAC / MP3 / OPUS |
audio/ogg;codecs=flac | Ogg (ogg/oga) | FLAC |
audio/ogg;codecs=opus | Ogg (ogg/oga) | OPUS |
audio/pcm;encoding=alaw;rate=8000 | - | PCM A-Law 8000 Hz (G.711) |
audio/pcm;encoding=ulaw;rate=8000 | - | PCM µ-Law 8000 Hz (G.711) |
audio/pcm;encoding=s16le;rate=8000 | - | PCM signed 16-bit little-endian 8000 Hz |
audio/pcm;encoding=s16le;rate=16000 | - | PCM signed 16-bit little-endian 16000 Hz |
audio/pcm;encoding=s16le;rate=44100 | - | PCM signed 16-bit little-endian 44100 Hz |
audio/pcm;encoding=s16le;rate=48000 | - | PCM signed 16-bit little-endian 48000 Hz |
audio/webm;codecs=opus | WebM (webm) | OPUS |
audio/x-matroska;codecs=aac | Matroska (mkv/mka) | AAC |
audio/x-matroska;codecs=flac | Matroska (mkv/mka) | FLAC |
audio/x-matroska;codecs=mp3 | Matroska (mkv/mka) | MP3 |
audio/x-matroska;codecs=opus | Matroska (mkv/mka) | OPUS |
We recommend the following bitrates as good tradeoff between quality and bandwidth:
audio/auto, audio/flac, audio/mpeg, audio/ogg, audio/webm, audio/x-matroska, audio/ogg;codecs=flac, audio/ogg;codecs=opus, audio/pcm;encoding=alaw;rate=8000, audio/pcm;encoding=ulaw;rate=8000, audio/pcm;encoding=s16le;rate=8000, audio/pcm;encoding=s16le;rate=16000, audio/pcm;encoding=s16le;rate=44100, audio/pcm;encoding=s16le;rate=48000, audio/webm;codecs=opus, audio/x-matroska;codecs=aac, audio/x-matroska;codecs=flac, audio/x-matroska;codecs=mp3, audio/x-matroska;codecs=opus "audio/ogg;codecs=opus"
Message encoding format for WebSocket communication. Determines how messages are serialized and transmitted.
Using json, messages are JSON-encoded and sent as TEXT WebSocket frames. All binary fields (such as audio data) are base64-encoded strings.
Using msgpack, messages are MessagePack-encoded and sent as BINARY WebSocket frames. All binary fields (such as audio data) contain raw binary data.
For more details, see Message Encoding.
json, msgpack "json"
The source language of the audio stream. It can be left empty or must be one of the supported Voice API source languages and comply with IETF BCP 47 language tags. Note: Some source transcription languages are provided through external service partners. See the supported languages table for details.
ar, bg, bn, cs, da, de, el, en, es, et, fi, fr, ga, he, hr, hu, id, it, ja, ko, lt, lv, mt, nb, nl, pl, pt, ro, ru, sk, sl, sv, th, tl, tr, uk, vi, zh "en"
Controls how the source_language value is used.
auto: Treats source language as a hint; server can overridefixed: Treats source language as mandatory; server must use this languageauto, fixed "fixed"
List of target languages for translation. The stream will emit translations for each language. The maximum allowed target languages per stream is 5. Language identifiers must comply with IETF BCP 47. See the supported languages table for details.
5ar, bg, bn, cs, da, de, el, en, en-GB, en-US, es, et, fi, fr, ga, he, hr, hu, id, it, ja, ko, lt, lv, mt, nb, nl, pl, pt, pt-BR, pt-PT, ro, ru, sk, sl, sv, th, tl, tr, uk, vi, zh, zh-HANS, zh-HANT ["de", "fr", "es"](EAP) List of target languages for which to generate synthesized audio. Languages specified here will automatically be added to target_languages if not already present, ensuring you receive both text translation and audio synthesis for these languages. If omitted, only text transcription and translation will be provided (no audio synthesis). The maximum allowed target media languages per stream is 5. Language identifiers must comply with IETF BCP 47. Note: Some translated audio languages are provided through external service partners. See the supported languages table for details.
5ar, bg, cs, da, de, el, en, en-GB, en-US, es, fi, fr, hu, id, it, ja, ko, nb, nl, pl, pt, pt-BR, pt-PT, ro, ru, sk, sv, tr, uk, vi, zh, zh-HANS, zh-HANT ["de", "en-GB"](EAP) The audio format for synthesized target media streaming. Specifies container, codec, and encoding parameters for the audio returned in target_media_chunk messages. If not specified, defaults to audio/webm;codecs=opus. Only applies when target_media_languages is specified.
| Content Type | Container | Codec |
|---|---|---|
audio/flac | FLAC (flac) | FLAC 24000 Hz |
video/mp2t;codecs=aac | MPEG Transport Stream (Audio only) | AAC 70 kbit/s |
video/mp2t;codecs=opus | MPEG Transport Stream (Audio only) | OPUS 32 kbit/s |
audio/ogg | Ogg (ogg/oga) | OPUS 32 kbit/s |
audio/ogg;codecs=flac | Ogg (ogg/oga) | FLAC 24000 Hz |
audio/ogg;codecs=opus | Ogg (ogg/oga) | OPUS 32 kbit/s |
audio/opus | - | OPUS 32 kbit/s |
audio/pcm;encoding=alaw;rate=8000 | - | PCM A-Law 8000 Hz (G.711) |
audio/pcm;encoding=ulaw;rate=8000 | - | PCM µ-Law 8000 Hz (G.711) |
audio/pcm;encoding=s16le;rate=16000 | - | PCM signed 16-bit little-endian 16000 Hz |
audio/pcm;encoding=s16le;rate=24000 | - | PCM signed 16-bit little-endian 24000 Hz |
audio/webm | WebM (webm) | OPUS 32 kbit/s |
audio/webm;codecs=opus | WebM (webm) | OPUS 32 kbit/s |
audio/x-matroska;codecs=aac | Matroska (mkv/mka) | AAC 70 kbit/s |
audio/x-matroska;codecs=flac | Matroska (mkv/mka) | FLAC 24000 Hz |
audio/x-matroska;codecs=opus | Matroska (mkv/mka) | OPUS 32 kbit/s |
We recommend the following formats as good tradeoffs between quality and bandwidth:
audio/flac, video/mp2t;codecs=aac, video/mp2t;codecs=opus, audio/ogg, audio/ogg;codecs=flac, audio/ogg;codecs=opus, audio/opus, audio/pcm;encoding=alaw;rate=8000, audio/pcm;encoding=ulaw;rate=8000, audio/pcm;encoding=s16le;rate=16000, audio/pcm;encoding=s16le;rate=24000, audio/webm, audio/webm;codecs=opus, audio/x-matroska;codecs=aac, audio/x-matroska;codecs=flac, audio/x-matroska;codecs=opus "audio/webm;codecs=opus"
(EAP) Target audio voice selection for synthesized speech. The default voice is language dependent.
male, female "female"
A unique ID assigned to a glossary.
"def3a26b-3e84-45b3-84ae-0c0aaf3525f7"
Sets whether the translated text should lean towards formal or informal language. Possible options are:
default - use the default formality for the target languageformal/more - for a more formal languageinformal/less - for a more informal languagedefault, formal, more, informal, less "formal"
Successfully obtained streaming URL and token.
The WebSocket URL to use for establishing the stream connection.
"wss://api.deepl.com/v3/voice/realtime/connect"
A unique ephemeral token for authentication with the streaming endpoint. Pass this as a query parameter when connecting to the streaming URL. This token is ephemeral and valid for a short time and one-time use only.
"VGhpcyBpcyBhIGZha2UgdG9rZW4K"
Internal use only. A unique identifier for the requested stream.
"4f911080-cfe2-41d4-8269-0e6ec15a0354"