Google Unveils Veo 3: AI Video Generator with Built-In Audio

Published by

8 months ago

Google has officially introduced Veo 3, the latest iteration of its video generation model, marking a significant leap in artificial intelligence by integrating audio directly into AI-generated videos.

The announcement was made during the company’s annual I/O 2025 developer conference.

Until now, AI video tools like OpenAI’s Sora and Pika have been able to craft visually stunning videos but lacked one crucial element: sound.

“We’re emerging from the silent era of video generation,” said Google DeepMind CEO Demis Hassabis, underlining Veo 3’s groundbreaking advancement.

The model can generate synchronized audio, including dialogue, ambient noises, sound effects, and even animal sounds. This fusion allows users to simply describe a scene, like a cityscape or a forest of talking animals, and receive a fully-formed multimedia clip in return.

According to Google, “Veo 3 excels from text and image prompting to real-world physics and accurate lip syncing.”

Available Now via Premium Plans

Veo 3 is accessible starting today for U.S. users subscribed to Google’s new Gemini Ultra plan, priced at $249.99/month, and for enterprise clients through Vertex AI. It’s also integrated into Flow, Google’s newly unveiled AI filmmaking assistant, which brings together Veo, Imagen, and Gemini to help users create cinematic videos based on natural language prompts.

Flow: Google’s Cinematic Creation Engine

Flow is more than just a film tool. It acts as a creative hub where users can define shots, scenes, locations, and styles in plain language. It’s currently limited to Ultra and AI Pro subscribers in the U.S., but Google plans to expand its availability internationally.

The tool also supports Veo 2, offering enhancements like camera control (zoom and rotation), reference-based generation, object insertion/removal, and frame conversion (portrait to landscape).

Imagen 4 and Other AI Media Tools Unveiled

Alongside Veo 3, Google introduced Imagen 4, its most advanced image generation model to date. Capable of rendering intricate textures like fabric and fur and supporting typography and 2K resolution, Imagen 4 offers both photorealistic and abstract results. It is accessible through Gemini, Vertex AI, and Workspace apps like Docs and Slides. A 10x faster version of Imagen 4 is also on the horizon.

Google also launched SynthID Detector, a web-based portal that helps users verify if a media file includes its AI watermark. This move is in response to growing concerns about distinguishing real from AI-generated content.

A Tipping Point in AI-Generated Entertainment?

Veo 3 stands out not just for its audio features but for its impact on content creation. Google imagines creators giving it a “short story in your prompt.”

In return, Veo could deliver a complete video with visuals and sound. In the future, it might even create full-length animated films.

Unlike other tools, Veo 3 doesn’t need post-production for audio. It syncs lips and generates dialogue on its own. This could change the game for film, animation, and storytelling.

Huma Ishfaq