Facebook Extends DHCPLB: When DHCP Is Too Small For Its Infrastructure

When we were little, even the smallest crib can be huge. But when we outgrew it, we need something bigger to place ourselves in. The same goes for Facebook.

People can tell a lot about a certain tech company by looking at its tech stack. Facebook is the social giant of the web, and it announced the creation of a new open-source DHCP server, simply because the existing DHCP is too small for it.

DHCP or Dynamic Host Configuration Protocol, is a network management protocol used on UDP/IP networks whereby a DHCP server dynamically assigns an IP address and other network configuration parameters to each device on a network so they can communicate with other IP networks.

And here, Facebook's existing DHCP somehow restricts its movement.

Its proposed replacement, is an upgraded DHCPLB, which is aimed to make the company efficiently scale its data center efforts.

"With this version, we’ve seen better throughput and are able to iterate faster than we could with our previous solution. In fact, we are now handling the same volume of traffic with 10 times fewer servers," wrote Pablo Mazzini, Production Engineer at Facebook, in a blog post.

Facebook DHCP - DHCPLB

What DHCP does, is automatically assigns networking information (like IP addresses) to hosts on a network. This makes it possible for a device to easily connect to a network.

In Facebook's case, it allows it to add new server infrastructure without needing much manual work. But here, Facebook that uses DHCP to assign IP addresses, simply can't keep up with the company's willingness to keep up with the traffic that kept bombarding its servers.

As Facebook’s traffic and hardware need to grow, so too does its dependence on this one crucial DHCP protocol.

The problem is that the existing DHCP servers aren’t really designed for operating at the same scale as Facebook, which is one of the world’s most busiest and most highly-trafficked websites.

Starting 2014, Facebook used the open-source Kea DHCP server, which is a single-threaded application. Facebook was liking it at first, but as it continues to grow, it became increasingly clear that the company was outgrowing it.

The company was forced to look for alternatives as a way out.

In 2016, Facebook mate Kea with its own DHCPLB load balancer. This automatically disperses incoming requests across a list DHCP servers. Built using Google’s GO programming language, it was designed to make it easier for Facebook to take advantage of multi-core processors.

It also allows Facebook to define two pools of DHCP servers: stable and release candidate (RC); and set the proportion of traffic each one receives, while also allowing it to experiment with various DHCP setups using A/B testing.

This improved performance, it wasn't the real way out.

The reason was because the Kea servers use a single-core application, and this limitation had never been addressed. As a result, Facebook was still facing massive bottlenecks.

"The single-threaded nature of the software means that only a single transaction may be processed at a time; thus, if each backend call takes 100ms, then a Kea instance will be capable of doing, at maximum, 10 queries per second (QPS)," explained Mazzini.

The company then decided to reinvent DHCPLB as a fully-featured DHCP server.

Facebook DHCPLB latency

"The new setup allowed us to take advantage of the multithread design to prevent the new server from blocking and queuing up packets when doing back-end calls. We first had to make changes to DHCPLB to add a new mode that doesn’t forward packets but instead generates responses,” added Mazzini.

With this reinvented DHCPLB, Facebook saw a reduced latency of DHCP requests.

A chart published by the company shows that as DHCP requests rise, latency remains pretty much constant. This has allowed Facebook to remove Kea from its infrastructure.

"DHCPLB gives us the ability to A/B test changes on the server implementation. Even after we rolled out DHCPLB, we continued to run the Kea servers in parallel so we could monitor error logs until we were confident that the replacement would be at least as reliable as Kea had been. We have since deprecated the Kea DHCPv6 server," Mazzini added.

For most companies on the web, they wouldn't need to reinvent this technology.

But speaking of Facebook, its move simply show its immense scale it operates, and in particular, emphasizing its needs to constantly expand its infrastructure to cope with users' demands.

Published: 
29/05/2019