Google Starts Crawling Websites Through HTTP/2 Starting November 2020

Googlebot HTTP/2

In the era where hackers are smarter and websites are getting more complex, an upgrade is a must to ensure better security and speed.

The World Wide Web, or the web as we know it, uses the Hypertext Transfer Protocol (HTTP) as its application layer protocol for distributed, collaborative, hypermedia information systems.

HTTP is the foundation of data communication for the web, where pages have links to other resources, and those resources to the next, and so on.

Google is the search engine giant of the web. Having conquered most of web's market share, its crawler Googlebot has been utilizing HTTP as its default.

To meet the growing web, where the market demands better security and speed, the company wants to start support crawling over HTTP/2.

As the successor of HTTP, HTTP/2 that was first published in 2015, is a more efficient expression of HTTP.

According to Google on its announcement:

"Ever since mainstream browsers started supporting the next major revision of HTTP, HTTP/2 or h2 for short, web professionals asked us whether Googlebot can crawl over the upgraded, more modern version of the protocol."

Today we're announcing that starting mid November 2020, Googlebot will support crawling over HTTP/2 for select sites."

HTTP/2 was derived from the earlier experimental SPDY protocol, originally developed by Google in 2009.

HTTP/2 was developed by the HTTP Working Group (also called httpbis) of the Internet Engineering Task Force (IETF).

As the successor of the original HTTP, HTTP/2 is the first new version HTTP since HTTP 1.1, which was standardized in RFC 2068.

With most major browsers supporting the HTTP/2 standardization effort, and about 98% of all web browsers have upgraded, it's time for Google to also embrace the upgraded protocol on its crawler.

According to Google, "HTTP/2 is much more robust, efficient, and faster than its predecessor, due to its architecture and the features it implements for clients (for example, your browser) and servers."

This was its developers page, saying that the protocol will make its apps "faster, simpler, and more robust."

In other words, HTTP/2 is built for better speed.

For Googlebot, HTTP/2 allows it to open a single TCP connection to a target server in a more efficient manner, allowing it to transfer multiple files in parallel. In the previous version of HTTP, Googlebot is required to open multiple connections to transfer multiple file.

"In general, we expect this change to make crawling more efficient in terms of server resource usage."

HTTP/2 parallel
One the advantages of HTTP/2 over its predecessors, is the ability of the server to send multiple responses for a single client request. (Credit: Google)

Starting November 2020, Google plans to launch the first phase of the transition by crawling a small number of websites over the HTTP/2 protocol.

After that, Google will then "ramp up gradually to more sites that may benefit from the initially supported features, like request multiplexing."

Since not all websites on the web support HTTP/2, Google will only crawl websites over the protocol if they support it, and also when it sees that the websites, and Googlebot, would benefit from crawling over HTTP/2.

"If your server supports h2 and Googlebot already crawls a lot from your site, you may be already eligible for the connection upgrade, and you don't have to do anything," said Google.

Adding that "if your server still only talks HTTP/1.1, that's also fine. There's no explicit drawback for crawling over this protocol; crawling will remain the same, quality and quantity wise."

Ending its announcement, Google said that websites with HTTP/2 support won't benefit any ranking advantage.

"The primary benefit of h2 is resource savings, both on the server side, and on Googlebot side. Whether we crawl using h1 or h2 does not affect how your site is indexed, and hence it does not affect how much we plan to crawl from your site."

Published: 
17/09/2020