The Wayback Machine And Cloudflare Partner To Help Stop The Web From Going Offline

18/09/2020

The Internet Archive's Wayback Machine documents the web, and caches copies of long-forgotten web pages.

Cloudflare on the other hand, provides the Always Online service, which makes web pages available when their are offline or unreachable.

And here, the two have partnered to further increase the number of web pages that can be cached.

"Websites that enable Cloudflare's Always Online service will now have their content automatically archived, and if by chance the original host is not available to Cloudflare, then the Internet Archive will step in to make sure the pages get through to users," said Mark Graham, director of the Internet Archive's Wayback Machine.

"We worked with them to make sure they were OK with us using it in this way," added Graham-Cumming.

"It’s one of those things where it’s like, yeah, this works for everybody, so let's do it. If you come to a website that uses Cloudflare and it’s offline, we will show the latest version that’s in the Wayback Machine archive."

Cloudflare partners with the Wayback Machine

The non-profit Internet Archive is a non-profit that is closing in to caching 500 billion web pages worth more than 45 petabytes of data, by adding about 1 billion new URLs per day.

It's far from archiving everything, considering that the web is growing a lot faster than its crawlers can possibly crawl.

But with the partnership, the Wayback Machine should improve its ability.

As for Cloudflare, CEO Matthew Prince said that the company's Always Online feature saves "a limited copy of your cached website to keep it online for your visitors" when the origin server is unavailable. At this time, the company serves more than 25 million sites.

And partnering with the Wayback Machine, "will improve the Always Online service."

"The Internet Archive's Wayback Machine has an impressive infrastructure that can archive the Web at scale," Prince said.

This is how it works:

If a website using Cloudflare's services goes offline or unreachable, Cloudflare's edge will return a status code in the 520 to 527 range, indicating an issue connecting to the origin.

When this happens, Cloudflare will first look to the local edge datacenter to see if there is a stale or expired version of content it can serve to the website visitor. If there isn’t any, Cloudflare will then go to the Internet Archive to "fetch the most recently archived version of the site to serve to your visitors."

Cloudflare partners with the Wayback Machine

On the internet, there is no way to guarantee all websites to keep going online without interruption.

This fact makes the Wayback Machine a valuable tool that could be traced back to 1996.

"We’d just like to make the web more reliable," said Brewster Kahle, founder of the Internet Archive.

"We want a robust infrastructure out there and we can be part of it, but we’re not all of it. We want multiple participants to be working together in all different ways. We would not be a very good content distribution network and maybe Cloudflare wouldn’t necessarily be the best archive of the web."

Kahle said that the partnership with Cloudflare has been very constructive in early testing, and he'd like to see more collaborations that cross what he calls "the .com, .org boundary."

Both Kahle and Graham said the the Wayback Machine's infrastructure is ready to handle the additional queries and data pulls from Cloudflare's Always Online.

Previously, the Wayback Machine partners with Brave browser to help cache websites if its users run into a 404 error.