Speech-to-Text and Text-to-Speech Services

At the two CVG locations in Europe and the USA, VIER provides a wide variety of Speech-to-Text (STT) and Text-to-Speech (TTS) services, which are ideally sourced from the same region. Exceptions to this regional sourcing are indicated in the User Interface (UI); for example, with the label “US-hosted” if a US-hosted STT service is being offered at the CVG location in Europe.

Additionally, our customers and partners have the option to bring their own subscriptions for many of the integrated STT and TTS services. In this case, it is the responsibility of the customer or partner to obtain the service from the region that best suits their needs.

Overview and regional Availability

CVG Europe: Speech-to-Text and Text-to-Speech Services

CVG Europe is accesible via https://cognitivevoice.io.

Speech-To-Text Service	Provider	Region	Option to Bring your own STT	Remark
EML (German only)	VIER	🇪🇺 EU	no
Deepgram STT	Deepgram	🇺🇸 US	yes	Not available for production loads. Staging environment only.
ElevenLabs	ElevenLabs	🇪🇺 EU	yes
Google STT	Google (GCP)	🇪🇺 EU	yes
IBM Watson STT	IBM	🇪🇺 EU	yes
Microsoft STT	Microsoft Azure	🇪🇺 EU	yes
Whisper	Microsoft Azure	🇪🇺 EU	yes
VIER STT	VIER	🇪🇺 EU	no

Text-to-Speech Service	Provider	Region	Option to Bring your own TTS
Amazon Polly	Amazon (AWS)	🇪🇺 EU	yes
ElevenLabs	ElevenLabs	🇪🇺 EU	yes
Google TTS	Google (GCP)	🇪🇺 EU	yes
IBM Watson TTS	IBM	🇪🇺 EU	yes
Microsoft TTS	Microsoft Azure	🇪🇺 EU	yes
Nuance	VIER	🇪🇺 EU	no
OpenAI TTS	Microsoft Azure	🇪🇺 EU	yes
VIER TTS (German only; other languages on demand)	VIER	🇪🇺 EU	no

CVG USA: Speech-to-Text and Text-to-Speech Services

CVG USA is accessible via https://us.cognitivevoice.io.

Speech-To-Text Service	Provider	Region	Option to Bring your own STT
Deepgram	Deepgram	🇺🇸 US	yes
Google STT	Google (GCP)	🇺🇸 US	yes
Microsoft STT	Microsoft Azure	🇺🇸 US	yes
IBM Watson STT	IBM	🇺🇸 US	yes
Whisper	OpenAI	🇺🇸 US	yes

Text-to-Speech Service	Provider	Region	Option to Bring your own TTS
Amazon Polly	Amazon (AWS)	🇺🇸 US	yes
ElevenLabs	ElevenLabs	🇺🇸 US	yes
Google TTS	Google (GCP)	🇺🇸 US	yes
IBM Watson TTS	Google (GCP)	🇺🇸 US	yes
Microsoft TTS	Microsoft Azure	🇺🇸 US	yes
OpenAI TTS	OpenAI	🇺🇸 US	yes

Speech-to-Text Services

For most projects, we recommend the following STT engines:

Microsoft
Google
IBM

These providers support a wide range of languages and offer advanced features like punctuation and profanity filtering. Additionally, Whisper by OpenAI is a newcomer delivering great performance. VIER STT and EML (EML is available for German only) are hosted in our secure VIER environment, ensuring optimal results. Deepgram is a great choice for our US-customers. Other STT engines are available upon special request.

Select one ore several of these Speech-to-Text engines for your voice application by selecting it in the CVG console for your CVG project or via API request.

Deepgram

Deepgram’s Speech-to-Text service is recognized for its low latency and high accuracy, which are crucial for providing an exceptional voicebot experience. Deepgram provides accurate transcriptions in many languages.

Deepgram STT services are currently hosted in the US by Deepgram. This is advantageous for our US customers and enables our European customers to evaluate Deepgram STT. If you wish to use Deepgram STT in your production projects and require Deepgram STT to be hosted in the EU, please contact us at support@vier.ai.

EML

We host EML Speech-to-Text services in our secure VIER environment, making it an excellent alternative for those preferring not to transfer users’ speech to a US hyperscaler, even if their service is provided within the EU.

EML STT is available for German only, it’s not available for our American customers.

Google

Google Speech-to-Text employs advanced machine learning algorithms to transcribe speech accurately in real-time.

Google Speech-to-Text is available for all languages supported by Google.

Google Speech-to-Text is hosted on the Google Cloud Platform (GCP). For our European customers, the default Google Speech-to-Text service uses EU regional API endpoints.

Our default Google Speech-to-Text and Google Streaming Speech-to-Text services are using regional endpoints:

“europe-west1” for European customers
“us-east1” for American customers

Since the majority of interactions take place via phone calls, we use the Google Speech-to-Text (STT) telephony model if it is available for the selected language. This specialized model has been carefully trained on audio data from phone calls so that it can deliver significantly better transcription results for similar audio data.

Google Default Speech-to-Text

The default Google Speech-to-Text (provided as “Google (Default)” in CVG Console) means that CVG itself recognizes the beginning and end of an utterance in the audio and sends this utterance only as an audio snippet to Google for transcription. So here we are more flexible, e.g. when defining the end of an utterance.

Google Streaming Speech-to-Text

With Google Streaming Speech Recognition, CVG streams the caller’s real-time audio to Google throughout the duration of the call. This is also referred to as “endless streaming transcription.” Speech-to-Text results are provided by Google as the audio is processed.

IBM

IBM offers a Speech-to-Text service as part of their Watson platform. This service uses machine learning algorithms to transcribe spoken words into written text in real-time. The service can also be customized to recognize specific vocabulary and language models.

IBM Speech-to-Text can be used for all available languages supported by IBM.

CVG supports the configuration of a customized IBM Speech-to-Text endpoint. This enables our customers to use customized speech models for improved recognition performance.

Our default IBM Speech-to-Text services are hosted in a Europen region of the Watson platform for our European customers.

Microsoft

Microsoft Speech-to-Text utilizes deep neural networks to transcribe spoken words into text with high accuracy. It supports multiple languages and excels in noisy environments. The service can also be customized to recognize specific vocabulary and language models.

Microsoft Speech-to-Text is available for all languages supported by Microsoft.

CVG supports the configuration of a customized Microsoft Speech-to-Text endpoint. This allows our customers to leverage customized speech models for improved recognition performance. By providing hints, the transcription accuracy of domain-specific words (e.g., product names) or phrases can be boosted.

Our default Microsoft Speech-to-Text services are hosted in the Azure Cloud regions:

“westeurope” for European customers
“eastus” for American customers

For our European customer we additionally provide Azure Speech-to-Text in the following Azure Cloud regions:

“germanywestcentral”
“northeurope”
“sweden”

OpenAI Whisper

OpenAI has a good STT service available for many languages. Currently we offer OpenAI Whisper hosted by OpenAI in the US.

VIER Speech-to-Text

In order to meet the specific security requirements of some of our customers while providing a high-quality speech recognition service with low latency, we operate our own Speech-to-Text service in our data center located within the EU.

We are excited to announce the public beta release of VIER Speech-to-Text, now available for your use. We look forward to receiving your feedback to help us improve this service further.

Text-to-Speech Voices

CVG supports several hundreds of standard and neural Text-to-Speech (TTS) Voices. The improvements in speech quality of neural voices come through a new machine learning approach which converts text into lifelike speech.

Select one of these voices in your voice application by selecting it in the CVG console for your CVG project or via API.

The voice of your choice is used in a call when your applications uses /call/say (spec) or /call/prompt (spec) endpoints.

In case you plan to use SSML in your application, keep in mind that SSML support can vary wildly between the various vendors and sometimes even between voices. Make sure you check out the SSML documentation specific to the vendors your choose, especially if you plan to use a different vendor as a fallback.

Amazon

All voices made available by Amazon in the Amazon cloud (AWS) can be used in CVG. This includes standard voices as well as neural voices. When selecting an Amazon Polly voice, the system will automatically select the neural version, if available, to ensure the best possible audio quality for your applications.

Find a list of Amazon TTS voices here.

Contact us if you want to use but can’t find one of these voices in your CVG console.

Amazon voices are hosted in the AWS regions:

“westeurope” for European customers
“eastus” for American customers

SSML Support

ElevenLabs

All voices made available by ElevenLabs can be used in CVG.

Select from the pre-made ElevenLabs Voices provided by us within CVG by default.
For additional options, we can add community-featured voices on request.
If you have an existing ElevenLabs subscription, you can bring your own subscription and integrate your custom voices into CVG.

Find a list of ElevenLabs TTS voices here.

Contact us if you want to use but can’t find one of these voices in your CVG console.

ElevenLabs voices are hosted in the ElevenLabs regions:

“europe” for European customers
“us” for American customers

Google

All voices made available by Google in the Google Cloud platform can be used in CVG. This range covers everything from standard options to cutting-edge generative AI models:

Standard & WaveNet: Reliable neural voices suitable for general-purpose applications.
Neural2: Enhanced, high-quality voices based on the latest architecture for improved naturalness.
Studio: Professional-grade, high-fidelity voices optimized for long-form content like narration and news reading.
Chirp 3 (HD): Highly realistic, conversational voices designed for low-latency streaming and lifelike interactions. The popular “Journey” voices have been rebranded under the Chirp HD family.
Gemini TTS (available from May 2026): The newest generative AI models (such as gemini-2.5-flash-tts) that offer granular, steerable control over style, accent, and emotional expression through natural language prompts.

Find a list of Google TTS voices here.

Contact us if you want to use but can’t find one of these voices in your CVG console.

SSML Support

IBM

All voices made available by IBM in the IBM cloud can be used in CVG. This includes standard voices as well as v3 voices (neural voices).

Find a list of IBM voices here.

Contact us if you want to use but can’t find one of these voices in your CVG console.

SSML Support

OpenAI

All voices made available by OpenAI in the OpenAI cloud in the US cloud can be used in CVG.

Find a list of OpenAI voices here.

OpenAI provides OpenAI TTS voices in the USA only. To use OpenAI TTS Voices in the EU use OpenAI TTS voices hosted by Microsoft.

Microsoft

All voices made available by Microsoft in the Microsoft Cloud (Azure) can be used in CVG. This selection ranges from standard neural voices to high-expressivity models developed in partnership with OpenAI:

Neural Voices: The industry-standard high-quality voices. These use deep neural networks to deliver human-like intonation and clear articulation.
Multilingual Neural: Versatile voices that can speak multiple languages while maintaining the same unique persona and accent consistency across borders.
OpenAI Voices: Direct integration of OpenAI’s state-of-the-art TTS models (including Alloy, Echo, Fable, Onyx, Nova, and Shimmer), optimized for high-quality, emotive, and conversational use cases.
Personal Voice: Advanced technology that allows for the creation of a synthetic voice based on a short speech sample, providing a highly personalized experience.
Custom Neural Voice (CNV): Professional, brand-specific voices created through custom training to give your application a unique vocal identity.

Find a list of Microsoft voices here.

Contact us if you want to use but can’t find one of these voices in your CVG console.

SSML Support

Nuance

From Nuance the following neural voices are available in CVG

US-English (en-US): Zoe
German (de-DE): Petra

We host Nuance in our secure cloud, i.e. a Germany-based datacenter.

Please ask us if you need another voice from Nuance.

SSML Support

VIER

To meet the security requirements of some of our customers, we are offering our own German TTS (Text-to-Speech) voice from our EU-based location. We are excited to hear your feedback, which will help us continuously improve this locally hosted TTS service.

Supported Languages

We support 61 languages and dialects:

Arabic (🇪🇬 Egypt)
Arabic (🇸🇦 Saudi Arabia)
Bengali (🇮🇳 India)
Bulgarian (🇧🇬 Bulgaria)
Catalan (🇪🇸 Spain)
Chinese, Cantonese (Traditional, 🇭🇰 Hong Kong)
Chinese, Mandarin (Simplified, 🇨🇳 China)
Chinese, Mandarin (Simplified, 🇭🇰 Hong Kong)
Chinese, Mandarin (Traditional, 🇹🇼 Taiwan)
Croatian (🇭🇷 Croatia)
Czech (🇨🇿 Czech Republic)
Danish (🇩🇰 Denmark)
Dutch (🇳🇱 Netherlands)
English (🇦🇺 Australia)
English (🇮🇳 India)
English (🇬🇧 United Kingdom)
English (🇺🇸 United States)
Estonian (🇪🇪 Estonia)
Filipino (🇵🇭 Philippines)
Finnish (🇫🇮 Finland)
French (🇨🇦 Canada)
French (🇫🇷 France)
French (🇨🇭 Switzerland)
German (🇦🇹 Austria)
German (🇩🇪 Germany)
German (🇨🇭 Switzerland)
Greek (🇬🇷 Greece)
Gujarati (🇮🇳 India)
Hebrew (🇮🇱 Israel)
Hidni (🇮🇳 India)
Hungarian (🇭🇺 Hungary)
Icelandic (🇮🇨 Iceland)
Indonesian (🇮🇩 Indonesia)
Italian (🇮🇹 Italy)
Japanese (🇯🇵 Japan)
Kannada (🇮🇳 India)
Korean (🇰🇷 South Korea)
Latvian (🇱🇻 Latvia)
Lithuanian (🇱🇹 Lithuania)
Malay (🇲🇾 Malaysia)
Malayalam (🇮🇳 India)
Norwegian Bokmål (🇳🇴 Norway)
Norwegian Nynorsk (🇳🇴 Norway)
Polish (🇵🇱 Poland)
Portuguese (🇧🇷 Brazil)
Portuguese (🇵🇹 Portugal)
Romanian (🇷🇴 Romania)
Russian (🇷🇺 Russia)
Serbian (🇷🇸 Serbia)
Slovak (🇸🇰 Slovakia)
Slovenian (🇸🇮 Slovenia)
Spansih (🇲🇽 Mexico)
Spanish (🇪🇸 Spain)
Spanish (🇺🇸 United States)
Swedish (🇸🇪 Sweden)
Tamil (🇮🇳 India)
Telugu (🇮🇳 India)
Thai (🇹🇭 Thailand)
Turkish (🇹🇷 Turkey)
Ukrainian (🇺🇦 Ukraine)
Vietnamese (🇻🇳 Vietnam)

If your desired language is not listed, please contact us, and we’ll explore the possibility of adding it.