How Google Reduces AI Training Costs By Up To 80% Using 'SEED RL'

Artificial Intelligence

Reinforcement learning (RL) is one of three basics of machine learning paradigms, alongside supervised learning and unsupervised learning.

As a way for machine learning agents to train, RL enforces them to take actions in an environment in order to maximize the notion of cumulative reward. The method focuses on optimizing the agents for specific goals, by finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

RL has shown success in solving the strategy board game Go and the multiplayer online battle arena video game Dota 2, for example.

However, RL techniques require increasingly large amounts of training to successfully learn even simple games.

This makes the process of training computationally expensive and time consuming.

In a published paper, Google researchers describe a framework called 'SEED RL' that scales AI model training to thousands of machines.

SEED RL
Frames per second comparing IMPALA and various configurations of SEED RL on DeepMind Lab. SEED RL achieves 2.4M frames per second using 4,160 CPUs. Assuming the same speed, IMPALA would need 14,000 CPUs. (Credit: Google)

As explained by the researchers in a Google blog post:

"This is achieved with a novel architecture that takes advantage of accelerators (GPUs or TPUs) at scale by centralizing model inference and introducing a fast communication layer. We demonstrate the performance of SEED RL on popular RL benchmarks, such as Google Research Football, Arcade Learning Environment and DeepMind Lab, and show that by using larger models, data efficiency can be increased. The code has been open sourced on GitHub together with examples for running on Google Cloud with GPUs."

Here, the researchers said that SEED RL could facilitate training at millions of frames per second on a machine while reducing costs by up to 80%, potentially opening the AI field for more startups that couldn't previously compete with huge AI labs.

SEED RL is based on Google’s TensorFlow 2.0 framework, which features an architecture that takes advantage of graphics cards and tensor processing units (TPUs) by centralizing model inference.

To avoid data transfer bottlenecks, SEED RL performs AI inference centrally with a learner component that trains the model using input from distributed inference.

The target model’s variables and state information are kept local, while observations are sent to the learner at every environment step and latency is kept to a minimum thanks to a network library based on the open source universal RPC framework.

SEED RL
Frames per second comparing IMPALA and various configurations of SEED RL on DeepMind Lab. SEED RL achieves 2.4M frames per second using 4,160 CPUs. Assuming the same speed, IMPALA would need 14,000 CPUs. (Credit: Google)

According to Google, SEED RL’s learner component can be scaled across thousands of cores, and the number of actors.

To evaluate SEED RL, the researchers benchmarked it on the commonly used Arcade Learning Environment, several DeepMind Lab environments, and the Google Research Football environment.

And this is where they got the numbers.

The researchers said that they managed to solve a previously unsolved Google Research Football task using 2.4 million frames per second with 64 Cloud TPU cores, which is an improvement over the previous state-of-the-art distributed agent by 80 times.

“This results in a significant speed-up in wall-clock time and, because accelerators are orders of magnitude cheaper per operation than CPUs, the cost of experiments is reduced drastically,” wrote the coauthors of the paper.

“We believe SEED RL, and the results presented, demonstrate that reinforcement learning has once again caught up with the rest of the deep learning field in terms of taking advantage of accelerators.”

Published: 
02/04/2020