Hero Banner

Azure Platform & GitHub

Azure-related updates, news and resources, such as GitHub.


Azure AI milestone: New Neural Text-to-Speech models more closely mirror natural speech



Neural Text-to-Speech—along with recent milestones in computer vision and question answering—is part of a larger Azure AI mission to provide relevant, meaningful AI solutions and services that work better for people because they better capture how people learn and work—with improved vision, knowledge understanding, and speech capabilities. At the center of these efforts is XYZ-code, a joint representation of three cognitive attributes: monolingual text (X), audio or visual sensory signals (Y), and multilingual (Z). For more information about these efforts, read the XYZ-code blog post


Neural Text-to-Speech (Neural TTS), a powerful speech synthesis capability of Azure Cognitive Services, enables developers to convert text to lifelike speech. It is used in voice assistant scenarios, content read aloud capabilities, accessibility tools, and more. Neural TTS has now reached a significant milestone in Azure, with a new generation of Neural TTS model called Uni-TTSv4, whose quality shows no significant difference from sentence-level natural speech recordings.  


Microsoft debuted the original technology three years ago, with close to human-parity quality. This resulted in TTS audio that was more fluid, natural sounding, and better articulated. Since then, Neural TTS has been incorporated into Microsoft flagship products such as Edge Read AloudImmersive Reader, and Word Read Aloud. It’s also been adopted by many customers such as AT&TDuolingoProgressive, and more. Users can choose from multiple pre-set voices or record and upload their own sample to create custom voices instead. Over 110 languages are supported, including a wide array of language variants, also known as locales.  

The latest version of the model, Uni-TTSv4, is now shipping into production on a first set of eight voices (shown in the table below). We will continue to roll out the new model architecture to the remaining 110-plus languages and Custom Neural Voice in the coming milestone. Our users will automatically get significantly better-quality TTS through the Azure TTS API, Microsoft Office, and Edge browser. 


Read more here: Azure AI milestone: New Neural Text-to-Speech models more closely mirror natural speech - Microsoft Research