Automatic Speech Recognition (ASR)
We support the following third-party service providers for ASR services:| ASR | On-Prem / Cloud | Languages | Regions | Word Error Rate (WER) | Comments |
|---|---|---|---|---|---|
| Cloud | Supported Languages | 1. Locations v2 - Docs |
- Regions | 4–9% | 1. Good for shorter utterances like ‘yes’, and ‘no’.
- Good for number inputs, alphanumeric inputs (for example, IDs, SSN, etc.).
- Supports class tokens, so that output format can be formatted up to some extent.
- Hints, Hint-hosts are supported.
- Extensive language support.| | Deepgram | Cloud & On-Prem | Supported Languages | Supports all regions across the globe | 3.44% | 1. Hints are supported
- Custom models available via Deepgram technical team
- Smart formatting for inputs like numbers, dates | | Azure | Cloud & On-Prem | Supported Languages | Regions | 5–10% | 1. Preferred ASR provider. It should be the default for new accounts.
- Low WER, lots of flexibility with custom models.
- Hints are supported.
- Extensive language support.
- Custom language models can be created/deployed through DIY (through the Azure portal). | | Nvidia Riva (Nvidia) | On-Prem | ASR Overview | - | 6.67% | | | Amivoice ASR (Advanced Media Inc) | Cloud | Supported Languages | Processing and storage primarily based in Japan | N/A | | | Amazon Transcribe | Cloud | Supported Languages | Regions | 2.60% | | | gnani.ai | Cloud & On-Prem | Supported Languages | Deployable in customer-specified region (private cloud or on-premises) | 2% | |
Text to Speech (TTS)
We support the following third-party service providers for TTS services:| TTS | On-Prem / Cloud | Languages | Regions | Comments |
|---|---|---|---|---|
| Cloud | Supported Voices | Operates within Google Cloud’s global infrastructure | ||
| Azure | Cloud & On-Prem | Supported Languages | Regions | 1. Extensive language support. |
- An extensive number of voices.
- Custom voice preparation can be done through the portal.
- SSML support (limited to MS Azure supported tags).| | OpenAI TTS | Cloud | Supported Languages | – | 1. Human-like voices.
- Limited number of voices. | | Eleven Labs | Cloud | Docs | – | 1. Human-like voices.
- Speed, temperature, and stability can be controlled through call control parameters.
- Voice cloning is possible with 30-second to 1-minute voice samples. samples | | AWS | Cloud | Supported Languages | Regions | | | gnani.ai | Cloud & On-Prem | API Service | – | | | Deepgram | Cloud & On-Prem | Supported Languages | – | 1. Limited number of languages.
- Human-like voices. | | Nvidia Riva TTS | On-Prem | – | – | |
Voice Biometrics
We support the following third-party service providers for voice biometrics:| Voice Biometric Vendor | Voice Biometric Engine | On-Prem / Cloud | Comments |
|---|---|---|---|
| ID R&D | ID Voice | – | – |