SSML (Speech Synthesis Markup Language)
Using SSML allows developers to control aspects of speech synthesis such as pronunciation, volume, pitch, and rate of speech. Here’s a guide on how to use SSML effectively:Understanding SSML Basics
- SSML is an XML-based markup language used to control text-to-speech synthesis.
- It provides tags to control various aspects of speech synthesis, including pronunciation, prosody, volume, and more.
- SSML is supported by many speech synthesis systems, including Amazon Polly, Google Text-to-Speech, and others.
Basic SSML Tags
<speak>: This is the root element of an SSML document and indicates the start and end of the speech content.<break>: Inserts a pause into the speech synthesis. You can specify the duration of the pause using thetimeattribute.<emphasis>: Emphasizes a portion of the text. You can specify the level of emphasis using thelevelattribute.<prosody>: Modifies aspects of speech such as pitch, rate, and volume. Attributes includepitch,rate, andvolume.<phoneme>: Specifies the pronunciation of a word using phonetic alphabet symbols.<say-as>: Indicates how a particular piece of text should be pronounced, such as numbers, dates, or currency.<audio>: Embeds audio files into the speech output.
Using SSML in Code
When using SSML in your code, wrap the SSML markup within<speak> tags.
Example:
Voice Tag Support
Support For Voice Elements in SSML
In the speak tag, follow below-mentioned steps:- Empty speak tag without attributes <speak>
- In this case, Voice Gateway will construct voice and language elements on its own based on values supplied in Call control params.
- Customized speak tag with attributes <speak version=“1.0” xml:lang=“en-US” xmlns=“W3C Speech Synthesis namespace ”>
- In this case, Voice Gateway will send the SSML without any modifications to the TTS engine. Follow option 2 and the voice element will work.