The AI industry was once relatively quiet, making only occasional waves within its niche and causing minimal disruption to the global tech landscape.
That changed when OpenAI unveiled ChatGPT. Its introduction sparked fierce competition among tech giants, launching an AI arms race that quickly reshaped the industry. Companies unwilling to adopt or adapt risked being left behind.
Among the biggest players, Google felt the tremors the most.
The company declared a "code red," a crisis serious enough to bring its two founders out of retirement.
In response, Google launched its AI chatbot Bard, though its early performance was underwhelming. Determined to stay competitive, Google revamped its approach, giving rise to Gemini, its latest AI contender.
After a series of subsequent minor updates and improvements, this time, Google finally unleashes 'Gemini 2.0.'
Introducing Gemini 2.0, our most capable AI model yet designed for the agentic era. Gemini 2.0 brings enhanced performance, more multimodality, and new native tool use. pic.twitter.com/C90FXCEDBV
— Google (@Google) December 11, 2024
In a blog post, Google said that:
The release of Gemini 2.0 starts with an experimental version of the AI called the 'Gemini 2.0 Flash.'
This advanced multimodal AI can generate text, images, and speech while processing diverse inputs, including text, images, audio, and video—similar to OpenAI’s GPT-4, which powers ChatGPT.
Today we announced Gemini 2.0, our most capable AI model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant. pic.twitter.com/SuS2OZOSzT
— Google (@Google) December 11, 2024
And what makes Gemini 2.0 unique in its own term is that in unlocks what it calls the "agentic experiences."
Gemini 2.0 Flash’s native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences.
Google’s Gemini 2.0 Flash introduces advanced features like multimodal reasoning, long-context understanding, complex instruction following, and native tool use. These enhancements enable agentic experiences, where AI systems can take action and assist users more intelligently.
Google is exploring practical applications through research prototypes:
- Project Astra: Expanding AI assistant capabilities.
- Project Mariner: Redefining human-AI interaction, starting with browsers.
- Jules: An AI-powered coding assistant for developers.
Though still in development, these projects are being tested with trusted users, laying the foundation for broader product integration in the future.
We’ve also made updates to Project Astra’s capabilities, including:
- More natural conversations
- Use of Search, Lens and Maps
- 10 minutes of in-session memory
Learn more → https://t.co/h5ziW7nIi2 pic.twitter.com/U07H9o1CSY— Google (@Google) December 11, 2024
Project Astra, now powered by Gemini 2.0, has more advantages:
- Better dialogue: Project Astra now has the ability to converse in multiple languages and in mixed languages, with a better understanding of accents and uncommon words.
- New tool use: With Gemini 2.0, Project Astra can use Google Search, Lens and Maps.
- Better memory: Google improved Project Astra’s ability to remember things while keeping users in control. It now has up to 10 minutes of in-session memory and can remember more conversations users had with it in the past.
- Improved latency: With new streaming capabilities and native audio understanding, the agent can understand language at about the latency of human conversation.
Learn more about Project Mariner ↓ https://t.co/5OqYWrlNuR
— Google (@Google) December 11, 2024
As for Project Mariner, the early research prototype built with Gemini 2.0 explores the future of human-agent interaction, starting with the web browser.
"As a research prototype, it’s able to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you," said Google.
Jules is our experimental AI-powered code agent that can help devs fix bugs or other coding tasks — all with supervision. It’s now available to a group of trusted testers.
Learn more → https://t.co/53jC0ItgfT pic.twitter.com/MLArYrkUai— Google (@Google) December 11, 2024
Project Jules on the other hand, explores how AI agents can assist developers.
In a separate blog post, Google said that the experimental AI-powered code agent is integrated directly into a GitHub workflow, can tackle issues, develop a plan and execute it, all under a developer’s direction and supervision.
"This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding," said Google.
Looking ahead, Google is emphasizing agentic AI systems—models capable of understanding context, planning ahead, and taking action with user supervision.
"Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision," said Google CEO Sundar Pichai in a statement.
"Today we’re excited to launch our next era of models built for this new agentic era."