Background

Google Enhances Nano Banana, And Introduces Gemini 2.5 Computer Use: Refining How AI Think And Act

Google

The war of large language models (LLMs) is far from over.

What began when OpenAI's ChatGPT burst into the public, suddenly, having conversational AI that could write, reason, and assist felt no longer like science fiction but an everyday tool. Google, with its deep legacy in search, knowledge, and infrastructure, quickly realized that to stay relevant in the age of generative AI, had to catch up.

And not just that, it had to leap forward. And it did that with the original Gemini.

Over time, Google updated and upgraded Gemini, and infused the model with multimodal capabilities, long-context understanding, and more advanced reasoning modules.

And with that in mind, then came Nano Banana, which is now a shining star in Google’s AI constellation.

Nano Banana, which is the cool name for Gemini 2.5 Flash Image, which highlights the ability to blend multiple images, preserve character consistency, enable localized edits (e.g. changing a person’s outfit or adjusting background elements), and leverage world knowledge for contextual transformations.

The model is able to capture imaginations so well that within just weeks, Nano Banana became more than a technical demo; it became a trend. Users began experimenting with creative prompts like turning themselves (or their pets) into hyperrealistic 3D figurines, combining multiple photographs into imaginative scenes, or iteratively editing photos while preserving fidelity.

After introducing more aspect rations, like (Landscape: 21:9, 16:9, 4:3, 3:2, Square: 1:1, Portrait: 9:16, 3:4, 2:3, and Flexible: 5:4, 4:5), whispers and teardowns suggest Google is preparing to weave Nano Banana into more of its ecosystem.

It's said that Google wants to put Nano Banana into Lens, Circle to Search, and beyond.

In the latest version of the Android Google app (v16.40.18.sa.arm64), researchers have found internal traces of a “Nano Banana Create” option emerging inside Lens, alongside Search and Translate, signaling that users may soon be able to jump from visual search to creative edit with a tap.

Under the surface, Google is also pushing the frontier of agents that "use a computer," not just answer prompts.

The new Gemini 2.5 Computer Use model empowers agents to interact with graphical user interfaces.

Powered by a specialized model built on Gemini 2.5 Pro, the idea is to help AI agents use websites like humans would.

Available through the Gemini API in Google AI Studio and Vertex AI, the model can complete everyday digital tasks by clicking, typing, scrolling, and filling out forms.

This goes beyond standard APIs and allows AI to work through visual interfaces. At this time, although mainly optimized for web browsers, it also shows strong potential for mobile apps.

This kind of tool should be able to effectively blur the line between AI responses and real software manipulation.

This capability, combined with Nano Banana’s image powers, suggests a future where Google’s assistants can not only consult and generate, but execute visually grounded tasks across devices.

At this time, realizing that Nano Banana is not just a gimmick, Google wants to make it part of users' everyday visual workflows.

While it still has limitations (struggles with extreme aspect ratios, occasional inconsistencies when editing across varied images), the trajectory is unmistakable: image editing and creation are becoming conversational, seamless, and ambient rather than transactional.

As Nano Banana finds its way into Lens, Circle to Search, Search AI Mode, and more, it may well become the defining visual extension of Gemini’s promise: AI that sees, thinks, edits, and acts, in a world made of pixels and language both.

Published: 
08/10/2025