Bing Updates Its Bingbot Crawler For 'Maximizing Crawl Efficiency'

It started back in the SMX Advanced conference in June 2018, where it was announced that the team at Bing will focus on improving its crawler Bingbot

Bingbot is Bing's crawler, or sometimes also referred to as a 'spider'. The search engine from Microsoft crawls web pages using these bots to discover new and updated documents and contents, to then be added to Bing's searchable index. The primary goal for the bot is to "maintain a comprehensive index updated with fresh content," said Fabrice Canel, Bing's Principal Program Manager for Webmaster Tools.

Canel said that his team has made various improvements based on the feedback gathered from the SMX Advanced event.

He said the team will "continuing to improve" the crawler, and share what they’ve done in a new "BingBot series" on the Bing webmaster blog.

Bingbot uses an algorithm to determine which website to crawl, how often, and how many pages to fetch from each of them. The goal is to minimize crawl footprint on websites, while at the same ensuring that contents on its database are updated to the very recent.

"How do we do that? The algorithmic process selects URLs to be crawled by prioritizing relevant known URLs that may not be indexed yet, and URLs that have already been indexed that we are checking for updates to ensure that the content is still valid (example not a dead link) and that it has not changed. We also crawl content specifically to discovery links to new URLs that have yet to be discovered. Sitemaps and RSS/Atom feeds are examples of URLs fetched primarily to discovery new links," explained Canel.

The issue that Bing often encounter, is in managing the frequency that Bingbot needs to crawl.

Bing spider

Some webmasters want to have their websites crawled daily, while others would prefer their websites to be crawled only when they have new URLs updated or had contents uploaded/changed.

"The challenge we face, is how to model the Bingbot algorithms based on both what a webmaster wants for their specific site, the frequency in which content is added or updated, and how to do this at scale," Canel said.

"To measure how smart our crawler is, we measure Bingbot crawl efficiency. The crawl efficiency is how often we crawl and discover new and fresh content per page crawled. Our crawl efficiency north star is to crawl an URL only when the content has been added (URL not crawled before), updated (fresh on-page context or useful outbound links). The more we crawl duplicated, unchanged content, the lower our Crawl Efficiency metric is."

Bingbot crawler crawls billion of URLs every day. With that huge amount of task, it's difficult for Bing to satisfy all webmasters, all websites, Content Management Systems, while handling website downtimes, and ensuring that Bing isn't crawling those sites too frequently to ensure those site's servers aren’t overloaded by the crawler.

"We've heard concerns that Bingbot doesn't crawl frequently enough and their content isn't fresh within the index; while at the same time we've heard that Bingbot crawls too often causing constraints on the websites resources. It's an engineering problem that hasn't fully been solved yet," continued Canel.

What Bing is doing here is that, through Bingbot Series, the team behind the search engine is clearly listening to the webmaster and SEO community. The team at Webmaster Tools is making changes to ensure its crawler does not overload servers while at the same time are faster and more efficient when it comes to finding new content on websites.

Bing said that it is actively working on this, and will continue to work on this.

Read: How Search Engines Process Your Queries Determines Your Satisfaction

Published: 
23/10/2018