The tech giant islaunching a Nova Sonic which promises "natural conversations" wth users that can "understand the nuances of human conversation."
In a blog post, Amazon said: "Traditional approaches in building voice-enabled applications involve complex orchestration of multiple models, such as speech recognition to convert speech to text, large language models (LLMs) to understand and generate responses, and text-to-speech to convert text back to audio. This fragmented approach not only increases development complexity but also fails to preserve crucial acoustic context and nuances like tone, prosody, and speaking style that are essential for natural conversations.
Nova Sonic takes a new approach to solve these challenges. Instead of using different models, it unifies the understanding and generation capabilities into a single model.
"This unification enables the model to adapt the generated voice response to the acoustic context (e.g., tone, style) and the spoken input, resulting in more natural dialogue. Nova Sonic even understands the nuances of human conversation, including the speaker’s natural pauses and hesitations, waiting to speak until the appropriate time, and gracefully handling barge-ins"
The new development will also employ the use of an AI-powered travel agent that can book flights and generate a text transcript from speech.
The post added: "It also generates a text transcript for the user’s speech, enabling developers to use that text to call specific tools and APIs for building voice-enabled AI agents, like this example of an AI-powered travel agent that can book flights by retrieving up to date flight information. These capabilities, along with its lightning-fast inference, make voice applications powered by Nova Sonic more natural and useful."