AI Model On Mobile Phones Is A RAM Hog. It Requires Huge Memory, Said Google

The world of AI on smartphones is like a bustling metropolis within the confines of a tiny device.

And speaking of AI, the models are hungry beasts. They feed on data like there's no tomorrow, constantly analyzing, predicting, and learning from data, as well as users' usage patterns. To fuel this insatiable appetite, they need plenty of RAM to store all the information they're processing.

Just like trying to juggle multiple balls at once.

The more balls are up in the air, the more mental capacity is required to keep track of everything without dropping the ball.

Similarly, AI models on smartphones are juggling a myriad of tasks simultaneously. And with all those features, AI models need ample RAM to keep all those balls in the air without crashing.

The Google Pixel 8 Pro
The Google Pixel 8 Pro, which is Google's flagship phone in 2023, built to run AI model right inside it. Just like before the phone's RAM is directly soldered to the phone's system-on-a-chip (SoC).

There are two main reasons why AI models on smartphones gobble up a lot of RAM:

  1. Speed:. Running AI models locally on smartphones mean that the smartphones should be able to un complex calculations directly from the storage. And because RAM acts like a super fast workspace for the phone's CPU to access data quickly, phones with lesser RAM can be a bottleneck that prevents the processor from doing what it should.

    As a result, AI tasks that involve a lot of back-and-forth processing between the CPU and the model can be greatly limited.

  2. Complexity: AI models themselves can be quite large and intricate, with millions or even billions of parameters. To function correctly, the entire model may need to be loaded into the phone's memory, all at once. This is much faster than having to constantly fetch pieces of the model from storage every time it's needed.

    If RAM is insufficient, the AI model may not work like it should.

In simpler terms, RAM is like a handy desk for the CPU.

Large and complex AI models are like a messy workshop full of tools and parts. In order for an AI model to efficiently work tasks users are giving it, the CPU needs all the tools and parts readily available on its desk (RAM) instead of having to constantly run back and forth to the messy workshop (physical storage).

Think of RAM as the workspace where these algorithms do their magic. The more spacious the desk, the more efficiently the CPU can operate.

So, in a nutshell, smartphones equipped with ample RAM for AI models can offer faster processing, smoother multitasking, and more intelligent features.

Then, there is also another issue: cost.

Modern AI models have grown in size and complexity, meaning that smartphone manufacturers that need to run them face trade-offs between performance, power consumption, and cost.

While more RAM benefits AI performance, it also consumes power and increases costs.

In business, balancing these factors is crucial.

Google has what it calls 'Gemini,' a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2.

Read: Google Introduces 'Gemini,' An AI It Hopes Can Dethrone OpenAI's GPT-4

Google Gemini

Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4.

It powers the generative artificial intelligence chatbot of the same name.

In early March 2024, Google made an odd announcement that only one of its two latest smartphones, the Pixel 8 and Pixel 8 Pro, would be able to run Google Gemini.

The company said that the smaller Pixel 8 wouldn't get the new AI model, with the company citing mysterious "hardware limitations" as the reason.

The statement was odd, considering the fact that Google designed and marketed the Pixel 8 to be AI-centric, and that the company even developed smartphone-centric AI model called "Gemini Nano" which should be able to run on more limited resources.

But yet, the company still couldn't make the two work together.

Google however, backtracked this decision, and later said that the smaller Pixel 9 could get Gemini Nano in its next big quarterly Android release.

Google's Seang Chau, VP of devices and services software, explained the decision on the company's in-house Made by Google podcast.

"The Pixel 8 Pro, having 12GB of RAM, was a perfect place for us to put [Gemini Nano] on the device and see what we could do," Chau said.

"When we looked at the Pixel 8 as an example, the Pixel 8 has 4GB less memory, and it wasn't as easy of a call to just say, 'all right, we're going to enable it on Pixel 8 as well.'"

According to Chau, Google previously refrained from putting Gemini Nano on the smaller Pixel 8 because the company doesn't want to "degrade the experience."

Chau went on, and said that Google wants some of the AI model's feature to be "RAM-resident" so they're always loaded in memory.

One such feature is "smart reply," which tries to auto-generate text replies.

This requires tons or RAM.

The Google Pixel 8 series
The Google Pixel 8 series. The smaller Pixel 8 originally cannot run Gemini due to hardware limitation. Google has to revise its strategy to be able to make Gemini run in it.

"Smart Reply is something that requires the models to be RAM-resident so that it's available all the time. You don't want to wait for the model to load on a Gboard reply, so we keep it resident."

Because Google changed its mind, Google has to keep the Gemini-powered smart reply behind a developer flag for both the Pixel 8 and Pixel 8 Pro, and make the option in the normal keyboard settings aren't Gemini-powered.

This means that there are big trade-offs involved. But in the end, if AI is what users want, it's doable.