Skip to main content
POST
/
v3
/
voice
/
realtime
curl --request POST \
  --url https://api.deepl.com/v3/voice/realtime \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "source_media_content_type": "audio/ogg; codecs=opus",
  "source_language": "en",
  "source_language_mode": "auto",
  "target_languages": [
    "de",
    "fr",
    "es"
  ],
  "message_format": "json"
}
'
{
  "streaming_url": "wss://api.deepl.com/v3/voice/realtime/connect",
  "token": "VGhpcyBpcyBhIGZha2UgdG9rZW4K",
  "session_id": "4f911080-cfe2-41d4-8269-0e6ec15a0354"
}

Authorizations

Authorization
string
header
required

Authentication with Authorization header and DeepL-Auth-Key authentication scheme. Example: DeepL-Auth-Key <api-key>

Body

application/json
source_media_content_type
enum<string>
required

The audio format for streaming, which specifies container, codec, and encoding parameters. See the table below for supported formats. If audio/auto is specified, the server will auto-detect the container and codec for all supported combinations, except PCM. That requires explicit encoding parameters. All formats need to be single channel audio.

Content TypeContainerCodec
audio/autoAuto-detect: FLAC / Matroska / MPEG / Ogg / WebMAuto-detect AAC / FLAC / MP3 / OPUS
audio/flacFLAC (flac)FLAC
audio/mpegMPEG (mp3/m4a)MP3
audio/oggOgg (ogg/oga)Auto-detect FLAC / OPUS
audio/webmWebM (webm)OPUS
audio/x-matroskaMatroska (mkv/mka)Auto-detect: AAC / FLAC / MP3 / OPUS
audio/ogg;codecs=flacOgg (ogg/oga)FLAC
audio/ogg;codecs=opusOgg (ogg/oga)OPUS
audio/pcm;encoding=alaw;rate=8000-PCM A-Law 8000 Hz (G.711)
audio/pcm;encoding=ulaw;rate=8000-PCM µ-Law 8000 Hz (G.711)
audio/pcm;encoding=s16le;rate=8000-PCM signed 16-bit little-endian 8000 Hz
audio/pcm;encoding=s16le;rate=16000-PCM signed 16-bit little-endian 16000 Hz
audio/pcm;encoding=s16le;rate=44100-PCM signed 16-bit little-endian 44100 Hz
audio/pcm;encoding=s16le;rate=48000-PCM signed 16-bit little-endian 48000 Hz
audio/webm;codecs=opusWebM (webm)OPUS
audio/x-matroska;codecs=aacMatroska (mkv/mka)AAC
audio/x-matroska;codecs=flacMatroska (mkv/mka)FLAC
audio/x-matroska;codecs=mp3Matroska (mkv/mka)MP3
audio/x-matroska;codecs=opusMatroska (mkv/mka)OPUS

We recommend the following bitrates as good tradeoff between quality and bandwidth:

  • AAC: 96 kbps
  • FLAC: 256 kbps (16000 Hz)
  • MP3: 128 kbps
  • OPUS: 32 kbps (recommendation for low bandwidth scenarios)
  • PCM: 256 kbps (16000 Hz, default recommendation)
Available options:
audio/auto,
audio/flac,
audio/mpeg,
audio/ogg,
audio/webm,
audio/x-matroska,
audio/ogg;codecs=flac,
audio/ogg;codecs=opus,
audio/pcm;encoding=alaw;rate=8000,
audio/pcm;encoding=ulaw;rate=8000,
audio/pcm;encoding=s16le;rate=8000,
audio/pcm;encoding=s16le;rate=16000,
audio/pcm;encoding=s16le;rate=44100,
audio/pcm;encoding=s16le;rate=48000,
audio/webm;codecs=opus,
audio/x-matroska;codecs=aac,
audio/x-matroska;codecs=flac,
audio/x-matroska;codecs=mp3,
audio/x-matroska;codecs=opus
Example:

"audio/ogg;codecs=opus"

message_format
enum<string>
default:json

Message encoding format for WebSocket communication. Determines how messages are serialized and transmitted. Using json, messages are JSON-encoded and sent as TEXT WebSocket frames. All binary fields (such as audio data) are base64-encoded strings. Using msgpack, messages are MessagePack-encoded and sent as BINARY WebSocket frames. All binary fields (such as audio data) contain raw binary data.

For more details, see Message Encoding.

Available options:
json,
msgpack
Example:

"json"

source_language
enum<string>

The source language of the audio stream. It can be left empty or must be one of the supported Voice API source languages and comply with IETF BCP 47 language tags. Note: Some source transcription languages are provided through external service partners. See the supported languages table for details.

Available options:
ar,
bg,
bn,
cs,
da,
de,
el,
en,
es,
et,
fi,
fr,
ga,
he,
hr,
hu,
id,
it,
ja,
ko,
lt,
lv,
mt,
nb,
nl,
pl,
pt,
ro,
ru,
sk,
sl,
sv,
th,
tl,
tr,
uk,
vi,
zh
Example:

"en"

source_language_mode
enum<string>
default:auto

Controls how the source_language value is used.

  • auto: Treats source language as a hint; server can override
  • fixed: Treats source language as mandatory; server must use this language
Available options:
auto,
fixed
Example:

"fixed"

target_languages
enum<string>[]

List of target languages for translation. The stream will emit translations for each language. The maximum allowed target languages per stream is 5. Language identifiers must comply with IETF BCP 47. See the supported languages table for details.

Maximum array length: 5
Available options:
ar,
bg,
bn,
cs,
da,
de,
el,
en,
en-GB,
en-US,
es,
et,
fi,
fr,
ga,
he,
hr,
hu,
id,
it,
ja,
ko,
lt,
lv,
mt,
nb,
nl,
pl,
pt,
pt-BR,
pt-PT,
ro,
ru,
sk,
sl,
sv,
th,
tl,
tr,
uk,
vi,
zh,
zh-HANS,
zh-HANT
Example:
["de", "fr", "es"]
target_media_languages
enum<string>[]

(EAP) List of target languages for which to generate synthesized audio. Languages specified here will automatically be added to target_languages if not already present, ensuring you receive both text translation and audio synthesis for these languages. If omitted, only text transcription and translation will be provided (no audio synthesis). The maximum allowed target media languages per stream is 5. Language identifiers must comply with IETF BCP 47. Note: Some translated audio languages are provided through external service partners. See the supported languages table for details.

Maximum array length: 5
Available options:
ar,
bg,
cs,
da,
de,
el,
en,
en-GB,
en-US,
es,
fi,
fr,
hu,
id,
it,
ja,
ko,
nb,
nl,
pl,
pt,
pt-BR,
pt-PT,
ro,
ru,
sk,
sv,
tr,
uk,
vi,
zh,
zh-HANS,
zh-HANT
Example:
["de", "en-GB"]
target_media_content_type
enum<string>
default:audio/webm;codecs=opus

(EAP) The audio format for synthesized target media streaming. Specifies container, codec, and encoding parameters for the audio returned in target_media_chunk messages. If not specified, defaults to audio/webm;codecs=opus. Only applies when target_media_languages is specified.

Content TypeContainerCodec
audio/flacFLAC (flac)FLAC 24000 Hz
video/mp2t;codecs=aacMPEG Transport Stream (Audio only)AAC 70 kbit/s
video/mp2t;codecs=opusMPEG Transport Stream (Audio only)OPUS 32 kbit/s
audio/oggOgg (ogg/oga)OPUS 32 kbit/s
audio/ogg;codecs=flacOgg (ogg/oga)FLAC 24000 Hz
audio/ogg;codecs=opusOgg (ogg/oga)OPUS 32 kbit/s
audio/opus-OPUS 32 kbit/s
audio/pcm;encoding=alaw;rate=8000-PCM A-Law 8000 Hz (G.711)
audio/pcm;encoding=ulaw;rate=8000-PCM µ-Law 8000 Hz (G.711)
audio/pcm;encoding=s16le;rate=16000-PCM signed 16-bit little-endian 16000 Hz
audio/pcm;encoding=s16le;rate=24000-PCM signed 16-bit little-endian 24000 Hz
audio/webmWebM (webm)OPUS 32 kbit/s
audio/webm;codecs=opusWebM (webm)OPUS 32 kbit/s
audio/x-matroska;codecs=aacMatroska (mkv/mka)AAC 70 kbit/s
audio/x-matroska;codecs=flacMatroska (mkv/mka)FLAC 24000 Hz
audio/x-matroska;codecs=opusMatroska (mkv/mka)OPUS 32 kbit/s

We recommend the following formats as good tradeoffs between quality and bandwidth:

  • OPUS (WebM): 32 kbps, recommended for low bandwidth scenarios (default)
  • PCM 24kHz: 384 kbps, high quality
Available options:
audio/flac,
video/mp2t;codecs=aac,
video/mp2t;codecs=opus,
audio/ogg,
audio/ogg;codecs=flac,
audio/ogg;codecs=opus,
audio/opus,
audio/pcm;encoding=alaw;rate=8000,
audio/pcm;encoding=ulaw;rate=8000,
audio/pcm;encoding=s16le;rate=16000,
audio/pcm;encoding=s16le;rate=24000,
audio/webm,
audio/webm;codecs=opus,
audio/x-matroska;codecs=aac,
audio/x-matroska;codecs=flac,
audio/x-matroska;codecs=opus
Example:

"audio/webm;codecs=opus"

target_media_voice
enum<string>

(EAP) Target audio voice selection for synthesized speech. The default voice is language dependent.

Available options:
male,
female
Example:

"female"

glossary_id
string

A unique ID assigned to a glossary.

Example:

"def3a26b-3e84-45b3-84ae-0c0aaf3525f7"

formality
enum<string>
default:default

Sets whether the translated text should lean towards formal or informal language. Possible options are:

  • default - use the default formality for the target language
  • formal/more - for a more formal language
  • informal/less - for a more informal language
Available options:
default,
formal,
more,
informal,
less
Example:

"formal"

Response

Successfully obtained streaming URL and token.

streaming_url
string
required

The WebSocket URL to use for establishing the stream connection.

Example:

"wss://api.deepl.com/v3/voice/realtime/connect"

token
string
required

A unique ephemeral token for authentication with the streaming endpoint. Pass this as a query parameter when connecting to the streaming URL. This token is ephemeral and valid for a short time and one-time use only.

Example:

"VGhpcyBpcyBhIGZha2UgdG9rZW4K"

session_id
string

Internal use only. A unique identifier for the requested stream.

Example:

"4f911080-cfe2-41d4-8269-0e6ec15a0354"