Monday, May 27, 2024

Amazon Polly’s Generative Engine Offers 3 Evocative Voices

Amazon Polly

Use human voices that sound authentic and of excellent quality in a variety of languages.

AWS is pleased to announce today the broad release of Amazon Polly’s generative engine, which comes in three voice versions: American English’s Ruth and Matthew, and British English’s Amy. A range of voices, languages, and styles, as well as private and publicly available data, were used to train the new generative engine. It renders context-dependent prosody, pauses, spelling, dialectal characteristics, foreign word pronunciation, and more with the utmost precision.

Talk output that is compatible with lexicons and Speech Synthesis Markup Language (SSML) tags can be customised and controlled.

Speech can be saved and shared in common file types such as OGG and MP3.

Deliver conversational user experiences and lifelike voices in a timely manner with consistently quick response times.

How it functions

You may turn articles into speech by using Amazon Polly, which synthesises human speech using deep learning algorithms. Amazon Polly offers hundreds of realistic voices in a wide range of languages, making it easy to create speech-activated applications.

Amazon Polly: how it works
Image credit to AWS

Use cases

Produce audio in numerous languages

Applications having a worldwide audience, such webpages, movies, and RSS feeds, can benefit from adding speech.

Engage clients by speaking in a natural tone of voice

Amazon Polly speech output can be recorded and played back to prompt callers via interactive or automated voice response systems.

Modify your loudness, pitch, speaking tempo, and manner of speaking

For speech synthesis applications, use SSML, a W3C standard XML-based markup language, which supports common SSML tags for intonation, emphasis, and phrasing.

Amazon Polly is an ML service that uses TTS to read text aloud. Amazon Polly lets you deploy speech-enabled apps across several countries using dozens of languages and high-quality, realistic human voices.

You may choose from a variety of voice options with Amazon Polly, such as neural, long-form, and generative voices, which produce remarkably expressive, emotionally intelligent, and human-like voices while delivering revolutionary gains in speech quality. With Speech Synthesis Markup Language (SSML) tags, you may modify the speech rate, pitch, or volume as well as store speech output in common formats like MP3 or OGG. You can also rapidly produce realistic voices and conversational user experiences with consistently fast response times.

The new generative engine: what is it?

Four voice engines are currently supported by Amazon Polly: generative, long-form, neural, and conventional voices.

Amazon Polly voices

The 2016 introduction of standard TTS voices makes use of conventional concatenative synthesis. This technique creates synthesised speech that sounds incredibly natural by piecing together the phonemes of recorded speech. The methods employed to segment the waveforms and the inherent changes in speech, however, restrict the quality of speech.

Introduced in 2019, neural teletext-speech (NTTS) voices rely on a neural network that processes phonemes in sequence to create spectrograms, which are then further processed by a neural vocoder to create a continuous audio output. Compared to its regular voices, the NTTS generates voices that are much more lifelike.

The goal of long-form voices, which will be released in 2023, is to hold listeners’ interest for longer content, including news stories, training manuals, or promotional films. They are created using state-of-the-art deep learning TTS technology.

Big Adaptive Streamable TTS with Emergent abilities (BASE) is a new research TTS model that Amazon scientists unveiled in February 2024. The Polly Generative Engine can produce artificially generated voices that resemble humans thanks to its technology. These voices can be used as an informed virtual assistant, marketer, or customer service representative.

The new generative voices are as follows:




English (US)



MaleEnglish (US)


FemaleEnglish (British)

These voice options are available for you to select based on your use case and application. Go to the AWS documentation’s Generative Voices section to discover more about the generative engine.

Begin utilizing generating voices

Using the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs, you can have access to the new voices.

Amazon Polly text to speech

To begin, navigate to the US (N. Virginia) Region of the Amazon Polly dashboard and select the Text-to-Speech option from the left pane. You can choose the Generative engine if you choose the voice of Ruth or Matthew in English, US, or Amy in English, UK. Enter your text, then click the generated voice output to listen to it or download it.

 Amazon Polly Text-to-Speech
Image credit to AWs

You can list the voices that make advantage of the new generative engine by using the CLI:

To synthesise sample text to an audio file (hello.mp3) with generative engine parameters and a supported voice ID, execute the synthesize-speech CLI command now.

Visit Code and Application Examples in the AWS documentation to see other code examples that make use of the AWS SDKs. You can use code examples for Java and Python, as well as examples of web applications written in Java or Python, iOS apps, and Android apps.

Amazon Polly pricing

The US East (North Virginia) Region is currently able to access the new generative voices of Amazon Polly. Based on how many text letters you turn into speech, you only pay for what you utilise. See Amazon Polly Pricing page for additional information.

Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes