Google DeepMind has introduced a new tool called V2A (video-to-audio) that can generate synchronized soundtracks and sound effects for videos created by AI video generators. Put simply, it turns video pixels into accompanying sound, creating ideal background music for the scene. What a time to be alive.
While AI video generation has made huge strides recently, the lack of accompanying audio has been a major drawback. V2A promises to bridge this gap by analyzing the video footage (pixels) and optionally adding text prompts to create a rich soundscape that matches the on-screen action.
DeepMind highlights several advantages of V2A. First, it can generate an unlimited number of soundtracks for any video. Second, editors can use text prompts to nudge the audio output in a particular direction. For example, you can type in something like “cinematic thriller music with tense ambience and footsteps” and add more nuance to the sound. And third, V2A can work with existing video footage, including silent films and archival material, opening up a world of creative possibilities.
“V2A can generate an unlimited number of soundtracks for any video input,” Google writes on its DeepMind blog. “Optionally, a ‘positive prompt’ can be defined to guide the generated output toward desired sounds, or a ‘negative prompt’ to guide it away from undesired sounds.”
Technically, V2A uses a diffusion model to process video information and text prompts. This model builds audio from random noise, guided by the visuals from the video. The final audio is then decoded and integrated with the video data.
Drawbacks and limitations
The system isn’t perfect yet, and there are some limitations to V2A. Currently, it can’t perfectly synchronize dialogue with lip movements, so you can tell that it’s AI-generated. In addition, the quality of the audio output depends on the quality of the video input.
Then, there’s a safety concern for DeepMind, especially considering the potential for misuse and deepfakes. For this reason, V2A is currently a research project undergoing rigorous testing before being released to the public.
Despite these limitations, V2A’s potential is undeniable. This technology has the potential to revolutionize filmmaking, animation, and even storyboarding—revolutionizing on the one hand and eliminating some jobs on the other. With DeepMind in the lead and other companies like OpenAI working on similar tech, the future of AI-generated audiovisual content looks… interesting, to say the least.
[via PetaPixel]