How Wikipedia Uses Meta's AI To Fight Vandalism, Biases And Misinformation

Wikipedia powered by Meta AI

On the web, it's safe to say that Wikipedia can summarize all of known human knowledge.

This is achieved through its editing model, in which practically anyone can contribute to adding information to its vastness of knowledge. Because Wikipedia outsources its information, this allows it to become the world's most up-to-date encyclopedia, with more than 17,000 new articles every single month.

According to Wikipedia, this translates to around 1 gigabyte of compressed text added per year.

The thing is, the editing model also makes it less accurate.

Because anyone can edit the entries, Wikipedia articles are prone to vandalism and biases.

While Wikipedia has tried to address this issue, and that its reputation for accuracy has significantly improved, Wikipedia still doesn’t consider itself a reliable source.

For this reason, the Wikimedia Foundation, the non-profit organization that oversees Wikipedia, regularly explores new solutions for these shortcomings, including from third-parties and partners.

And one of the solutions it agrees to use, is the one that comes from Meta, the company that owns Facebook.

The solution harnesses the power of AI to improve Wikipedia's citations.

Wikipedia citations are references used to corroborate crowdsourced information on the site. But more than often, they're often missing, incomplete, or inaccurate.

The AI team at Meta has launched a research initiative to help Wikipedia overcome this problem.

Using machine-learning-powered fact-checking technology from Meta, Wikipedia can automatically scan hundreds of thousands of citations at once to check their accuracy.

This massive capability is certainly more effective than using the manual editing labor of its editors.

To create this AI, the team at Meta trained it using a dataset of more than 134 million public web pages, called the 'Sphere'.

Sphere here, is a web-scale corpus and search infrastructure for web-scale data, which can be used as a source of candidate web pages. And Meta in using this Sphere, makes this project the largest of its kind that use the corpus for such research.

Side logo.

With that in mind, the team at Meta then trained the AI's algorithms on 4 million Wikipedia citations, in order for it to learn from "the contributions and combined wisdom of thousands of Wikipedia editors."

This allows the AI to understand what it needs to do.

If the AI finds a citation that it thinks is irrelevant, the system will automatically recommend a better source.

At the same time, the system will also display a specific passage that supports the claim.

The system can create its claim by first producing a list of candidate documents and comparing them with the original citation claim.

If and only if the original citation is not ranked above the candidate documents, then a new citation from the retrieved candidates is suggested. The AI makes its claims through a scoring system.

While the AI can do these things by itself automatically, it's still up to human editors to decide whether the AI's recommendation should be used.

The human editors will have to review the suggestions, or improve it if needed or necessary.

This AI is called 'Side', as explained by Meta' researchers:

"We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web.
Meta AI, Side.
The decision flow of Side. (Credit: Meta)

To illustrate how this works, the researchers used the example of a Wikipedia page on retired boxer Joe Hipp.

The entry describes the Blackfeet Tribe member as the first Native American to compete for the WBA World Heavyweight title.

But the model found that the citation for this claim was a webpage that didn’t even mention Hipp or boxing. The system then automatically searched its Sphere corpus for a replacement reference.

Soon, it found a more proper replacement, which is a 2015 article from the Great Falls Tribune, which said that Marvin Camel fought Joe Hipp of the Blackfeet Nation, and that he became the first Native American to challenge for the world heavyweight championship

"Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia," the study authors wrote.

"More generally, we hope that our work can be used to assist fact-checking efforts and increase the general trustworthiness of information online."

Meta has also open-sourced the project, which can give external researchers the tools needed to develop their own AI language systems that is less biased.

Published: 
13/07/2022