Support for Third-Party ASR, TTS, and Voice Biometrics

Automatic Speech Recognition (ASR)

We support the following third-party service providers for ASR services:

ASR	On-Prem / Cloud	Languages	Regions	Word Error Rate (WER)	Comments
Google	Cloud	Supported Languages	1. Locations v2 - Docs

Regions | 4–9% | 1. Good for shorter utterances like ‘yes’, and ‘no’.
Good for number inputs, alphanumeric inputs (for example, IDs, SSN, etc.).
Supports class tokens, so that output format can be formatted up to some extent.
Hints, Hint-hosts are supported.
Extensive language support.| | Deepgram | Cloud & On-Prem | Supported Languages | Supports all regions across the globe | 3.44% | 1. Hints are supported
Custom models available via Deepgram technical team
Smart formatting for inputs like numbers, dates | | Azure | Cloud & On-Prem | Supported Languages | Regions | 5–10% | 1. Preferred ASR provider. It should be the default for new accounts.
Low WER, lots of flexibility with custom models.
Hints are supported.
Extensive language support.
Custom language models can be created/deployed through DIY (through the Azure portal). | | Nvidia Riva (Nvidia) | On-Prem | ASR Overview | - | 6.67% | | | Amivoice ASR (Advanced Media Inc) | Cloud | Supported Languages | Processing and storage primarily based in Japan | N/A | | | Amazon Transcribe | Cloud | Supported Languages | Regions | 2.60% | | | gnani.ai | Cloud & On-Prem | Supported Languages | Deployable in customer-specified region (private cloud or on-premises) | 2% | |

We support the following third-party service providers for TTS services:

TTS	On-Prem / Cloud	Languages	Regions	Comments
Google	Cloud	Supported Voices	Operates within Google Cloud’s global infrastructure
Azure	Cloud & On-Prem	Supported Languages	Regions	1. Extensive language support.

An extensive number of voices.
Custom voice preparation can be done through the portal.
SSML support (limited to MS Azure supported tags).| | OpenAI TTS | Cloud | Supported Languages | – | 1. Human-like voices.
Limited number of voices. | | Eleven Labs | Cloud | Docs | – | 1. Human-like voices.
Speed, temperature, and stability can be controlled through call control parameters.
Voice cloning is possible with 30-second to 1-minute voice samples. samples | | AWS | Cloud | Supported Languages | Regions | | | gnani.ai | Cloud & On-Prem | API Service | – | | | Deepgram | Cloud & On-Prem | Supported Languages | – | 1. Limited number of languages.
Human-like voices. | | Nvidia Riva TTS | On-Prem | – | – | |

We support the following third-party service providers for voice biometrics:

Voice Biometric Vendor	Voice Biometric Engine	On-Prem / Cloud	Comments
ID R&D	ID Voice	–	–