Background

With 'Lyria 3,' Google Lets Users Turn Ideas, Photos, And Videos Into Their Own Soundtracks

Lyria 3

The LLM war has escalated into a full-spectrum battle over creative dominance.

What began as the release of OpenAI's ChatGPT, which turned into a race to produce the most coherent text has exploded into multimodal supremacy, where players vie to control not just words but entire sensory experiences. Text generation arrived first, then image synthesis turned imagination into visuals overnight, followed by video models that animate stills into fluid motion.

Now, the frontier has shifted decisively to audio, specifically music, closing the loop on AI's ability to craft complete, emotionally resonant media from mere prompts.

And now, Google intensifies this competition by releasing 'Lyria 3,' its most advanced music generation model yet, directly integrated into the Gemini app.

This isn't a niche tool tucked away in a lab; it's accessible to millions globally in beta, starting on desktop and expanding to mobile over the coming days.

To use Lyria, users can simply open Gemini like they normally would, to then select "Create music" from the tools menu.

From there, they can describe what they want, or upload a photo or video, and have Gemini transform the input into a high-fidelity, 30-second track complete with custom lyrics, vocals, and instrumentation.

The versatility stands out. Text-to-tracks lets anyone dictate genre, mood, tempo, or even quirky narratives: imagine prompting a high-energy 90s skate punk rock song about reminding your roommate to wash the dishes, complete with fast drums and shouted vocals. Multimodal inputs take it further: feed in an image of a rainy city street for instant lo-fi beats, or a video clip to inspire a matching soundtrack.

Templates provide quick inspiration, with dynamic suggestions helping refine prompts, while remixing options encourage iteration.

Outputs come with custom cover art, and every track embeds Google's SynthID watermark for transparent AI identification; Gemini can even verify uploads for the same marker.

This integration fits seamlessly into the broader evolution of generative AI.

What began as text, where the kind of model laid the groundwork for precise idea articulation, images democratized visual art, letting non-artists conjure photorealistic scenes or stylized illustrations. Then, things shifted to video, which extended that to time-based storytelling.

Audio generation however, lives in a completely different palette.

Audio, which consists of music, can add emotional layer that words, pictures, and even moving images often struggle to convey alone. A funny jingle, an uplifting anthem, or a chill background loop can now emerge in seconds, tailored to personal moments, social media posts, workouts, memes, or just everyday vibes.

Early reactions highlight the leap in quality.

Users report surprisingly realistic results, with coherent lyrics, structured elements like choruses, and far fewer artifacts than prior generations. It handles multiple languages and diverse styles effectively, broadening creative access beyond English-centric tools. While capped at 30 seconds, it's still ideal for snippets, Shorts, or quick soundbites.

Lyria3 cannot generate full songs, since its focus is more on speed and fidelity positions it as a go-to for casual creation.

Analysts note minimal immediate disruption to platforms like Spotify, emphasizing distribution's enduring value as AI floods the market with supply.

In this accelerating arms race, Lyria 3 signals that AI is no longer just assisting creativity: it's becoming the composer, lyricist, and producer rolled into one. The barrier between "I have an idea" and "here's the finished track" has collapsed, inviting everyone to experiment and share. What emerges next could redefine how we soundtrack our lives, one prompt at a time.

Published: 
19/02/2026