Google Unveils 'Gemma 3,' An Open-Source AI That Can Run Even On A Single GPU or TPU

Effectiveness is one thing, but efficiency is another.

Since OpenAI introduced ChatGPT, followed by Google’s introduction of Gemini, the race to develop more powerful Large Language Models (LLMs) has intensified.

These AI models aren’t just hungry for more data—they also demand vast computing resources.

In fact, if this arms race continues to escalate, it’s not far-fetched to say that GPUs may become scarce, as tech giants are snapping them up before they even reach the general market.

Because brute power means to utilize everything at one's disposal, efficiency is more about maximizing existing ones.

And this is where Google introduces 'Gemma 3'.

This LLM is so powerful that it can outperform DeepSeek's DeepSeek-V3 and OpenAI's o3-mini.

But it's also efficient that it can even run on one single graphics processing unit (GPU) or tensor processing unit (TPU).

Gemma 3 is here! The collection of lightweight, state-of-the-art open models are built from the same research and technology that powers our Gemini 2.0 models https://t.co/YYIwZM1IAZ pic.twitter.com/0O1N1YfbtX
— Google for Developers (@googledevs) March 12, 2025

In a blog post, Google said that Gemma 3 is essentially a collection of lightweight, state-of-the-art open models built from the same research and technology which powers its Gemini 2.0 models.

"These are our most advanced, portable and responsibly developed open models yet. They are designed to run fast, directly on devices — from phones and laptops to workstations — helping developers create AI applications, wherever people need them."

Gemma 3 comes in four different flavors, only differentiated by the size of their paramters: 1B, 4B, 12B and 27B sizes.

This allows users to to choose the best model for their specific hardware and performance needs, according to the company.

But what makes this Gemma 3 unique, is how it's incredibly efficient.

Google said that the largest model, the 27B, can run on just one NVIDIA H100 Tensor Core GPU.

You asked, we listened. Gemma 3 has some of the most requested features from the community:

- 16x longer context window (128k)
- Multimodality
- 140+ languages
- And integrations with your favorite frameworks such as @huggingface and @ollama → https://t.co/INZmt75xS8 pic.twitter.com/Qdi7aF51F3
— Google AI Developers (@googleaidevs) March 12, 2025

These models promise faster execution, even on modest computational setups, without compromising functionality or accuracy.

In comparison, people needed at least 10 times more compute power to get similar performance from other AI models.

They are designed to run fast, directly on devices — from phones and laptops to workstations — helping developers create AI applications, said Clement Farabet, vice president of Research, Google DeepMind and Tris Warkentin, director, Google DeepMind in a blog post.

Our high-performing open models leverage the power of @NVIDIAAIDev GPUs, are available in a range of sizes (1B, 4B, 12B, 27B), and offer the following capabilities:

Faster on-device inference
Support for 140+ languages
Multimodal understanding
128K-token context window
— Google for Developers (@googledevs) March 12, 2025

Besides setting a new benchmark for single-accelerator models, Gemma 3 comes with pretrained capabilities for over 140 languages, allowing it to cater to diverse audiences.

What this means, developers can create applications that connect with users in their native tongues, expanding the global reach of their projects.

Then, the AI also has sophisticated text and visual analysis, allowing it to reason as well.

Gemma 3 comes with a 128k-token context window, which means that it can analyze and synthesize large datasets. This should make it ideal for applications requiring extended content comprehension.

Besides that, Gemma 3 that also comes with function calling support, introduces official quantized versions. The magic here is that, the approach allows it to reduce the model's size while preserving output accuracy.

And to reduce computing costs even further, Google has introduced quantized versions of Gemma, which utilizes the process of "reducing the precision of the numerical values in a model’s weights" without sacrificing accuracy.

Google said Gemma 3 "delivers state-of-the-art performance for its size."

We’re also launching ShieldGemma 2: a powerful 4B image safety checker built on Gemma 3. Developers can customize ShieldGemma 2 to suit their safety needs.
— Google for Developers (@googledevs) March 12, 2025

Besides announcing Gemma 3, Google also said that it has built safety protocols into Gemma 3, including a safety checker for images called 'ShieldGemma 2.'

Google described it as "a powerful 4B image safety checker built on the Gemma 3 foundation."

What it does, is providing a ready-made solution for image safety, outputting safety labels across three safety categories: dangerous content, sexually explicit and violence.

Developers can further customize ShieldGemma for their safety needs and users. ShieldGemma 2 is open and built to give flexibility and control, leveraging the performance and efficiency of the Gemma 3 architecture to promote responsible AI development.

"Gemma 3’s development included extensive data governance, alignment with our safety policies via fine-tuning and robust benchmark evaluations," Google said in the blog post.

"While thorough testing of more capable models often informs our assessment of less capable ones, Gemma 3’s enhanced STEM performance prompted specific evaluations focused on its potential for misuse in creating harmful substances; their results indicate a low-risk level."

Published:

13/03/2025

Dark Mode

Search form

Google Unveils 'Gemma 3,' An Open-Source AI That Can Run Even On A Single GPU or TPU