Google introduces new audio generation capabilities in Google I/O 2025 with Gemini 2.5 models. The Mountain View -based tech -based tech develope is now letting developers and individuals testing these features on their platform. Two new abilities include Gemini 2.5 Flash previews with Desi Audio Dialogue and Control Text to Spitch (TTS). Although responding to the former user’s gestures, the former can produce audio like a human, but the latter can turn any script into a conversation speech. These features are currently not available to developers through the application programming interface (APIS).
Google Gemini 2.5 Flash’s audio output capabilities
In a blog post, Tech Dev described the features of these two audio generation methods, highlighting how developers can use them to create new experiences for people. Currently, the ancestral audio dialogue can be tested in the stream tab of the Google AI Studio, while the TTS feature can be tested in the media tab created inside the AI studio.
The ancestral audio dialogue with the Gemini 2.5 flash preview is designed for real -time conversation between the human user and AI. The user can either type a gesture or speak it, and AI responds orally. This process produces audio directly instead of developing a text and then turning it into speech.
It also has many benefits. It supports an impressive dialog, which means when Gemini responds to the sound of the 2.5 flash user’s voice, it can recognize the emotions behind the words. It can understand when the user responds to frightened, angry, or surprised and accordingly.
In addition, when the feature of audio generation speaks, adopts different accents and linguistic style, can access tools like Google Search, and support more than 24 languages.
Coming to the capable of controlling TTS, it offers a multi -speaker dialogue breed, describing the script, can produce emotions and tone, controls delivery speed and emphasizes accents, and supports the mixing of the same 24 languages and language.
Google says these capabilities were evaluated for potential risks in the development process. The company used both red teaming to find and fix any weaknesses along with both internal mechanisms. The company also highlighted that all audio outputs of these models are embedded with its water marking technology.


