Background

Google's Gemma 4 Can Now Run Inside A Browser, Locally, Thanks To An Open-Source Chrome Extension

Transformers.js, Chrome

In a notable development for on-device AI, an project demonstrates that Google's Gemma 4 model can run entirely inside a web browser.

The open-source project, created by developer Nico Martin, turns Chrome's sidebar into a local AI assistant capable of understanding natural-language requests and interacting with browser data such as tabs, browsing history, and page content.

The technical significance of the project however, lies in where the model runs.

According to public demonstrations shared by the official Google Gemma account, the assistant can do all these while operating fully on the user’s device.

Instead of sending prompts and browser context to a remote API, inference happens locally using WebGPU and the JavaScript library Transformers.js. WebGPU allows modern browsers such as Chrome to use the device's graphics processor for compute-intensive tasks, making it possible to run compact language models directly in the browser.

In other words, the project allows Gemma 4 to run locally, without needing any connection to external servers.

Transformers.js, developed in the Hugging Face ecosystem, provides a browser-compatible runtime for executing machine learning models with no separate backend infrastructure required.

Read: Google Introduces 'Gemma 4,' And How It Redefines Performance, Accessibility, And Open AI

The extension uses Google’s Gemma 4 E2B model, a smaller member of the Gemma 4 family designed for edge and resource-constrained environments.

Gemma is Google DeepMind’s family of open-weight models, released under permissive licensing terms that allow developers to inspect, modify, and deploy them in their own environments. In this case, the E2B variant is optimized to balance performance with memory efficiency, making it more suitable for local deployment than larger cloud-oriented models. That tradeoff is what enables a browser extension to perform agent-like tasks on everyday consumer hardware.

From an architectural standpoint, the project also highlights how browser extensions are evolving into application platforms for local AI. In the Hugging Face technical write-up, Martin describes a Manifest V3 extension structure that separates responsibilities across a background service worker, a side-panel user interface, and content scripts running on webpages.

The model runtime is hosted in the extension environment, while content scripts handle page extraction and actions such as highlighting text.

Messaging between these components allows the AI system to reason over a task, call tools, and then execute actions in the browser.

This "tool-calling" capability is particularly important.

Rather than functioning only as a chatbot, the assistant can choose from a set of predefined browser tools to complete tasks.

Examples shown publicly include opening or switching tabs, retrieving browsing history results, reading page text, and highlighting relevant content. In practical terms, this moves the model closer to an agent interface: instead of merely answering questions, it can perform browser actions in response to requests.

Privacy is one of the clearest advantages of this approach.

Many browser assistants and productivity tools depend on cloud APIs, which means page contents, browsing context, or user prompts may be transmitted to third-party servers. A local-first model reduces or removes that requirement because the data remains on the device, subject only to the browser permissions granted to the extension.

That does not eliminate all privacy concerns.

Users still need to trust the extension’s code and permission scope. But it changes the risk model substantially compared with cloud-only assistants.

There are also practical limitations.

Running models locally means performance depends heavily on the user’s hardware, available memory, browser version, and GPU support. Initial setup may require downloading model weights before first use. Smaller models can respond quickly for lightweight tasks, but more complex reasoning or longer contexts may still be slower than server-hosted systems running on dedicated GPUs.

Early public user feedback has noted that convenience and speed can vary depending on the machine used.

Even with those constraints, the release is significant because it shows that capable AI assistants no longer need to be tied exclusively to cloud infrastructure.

A browser, which is already the main workspace for many users, can host an assistant that reasons over local context and performs useful actions without external compute. For developers, it offers a reference implementation for building privacy-preserving AI tools. For users, it suggests a future where personal AI systems may increasingly run directly on laptops, phones, and browsers rather than through centralized services.

Published: 
29/04/2026