Since The 1990s, The 'Social Contract' For Content On The Open Web Is 'Freeware'

Mustafa Suleyman
CEO of Microsoft AI, co-founder of Inflection AI, co-founder and former Head of Applied AI of DeepMind

The web is open for everyone, unless there is a statement that explicitly says that it isn't. That is the nature of the "open web," and it has been around for more than decades.

According to Mustafa Suleyman, the co-founder of Inflection AI, who has been recruited by Microsoft to lead its Microsoft AI initiative, the moment people publish anything on the open web, it becomes something that anyone can freely copy and use.

He believes that said AI companies and developers can scrape most content published online and use it to train neural networks because those content are essentially "freeware."

He suggests that content that is accessible to anyone on the web, is free content, meaning that it's literally free for the taking.

Mustafa Suleyman
Mustafa Suleyman.

Speaking to CNBC's Andrew Ross Sorkin at the Aspen Ideas Festival, the Microsoft AI boss said that:

"With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

He added that:

"There's a separate category where a website or a publisher or a news organization had explicitly said, 'do not scrape or crawl me for any other reason than indexing me so that other people can find that content.'"

"That's a gray area and I think that's going to work its way through the courts."

His statements raise questions.

For example, some may ask whether it's actually fine to use other people's work to create new content?

If it is, then is it acceptable to profit off those recreations or work derivative of preexisting content?

How could websites and organizations "explicitly" say that their work cannot be used for AI training before AI became commonplace?

And since Microsoft is involved here, has the company respected any organization that specified content should only be used for search? Have Microsoft's partners, including OpenAI, respected any demands that content not be used for AI training?

These questions came right on time, during the time Microsoft is targeted by multiple lawsuits alleging that it and OpenAI are stealing copyrighted online stories to train generative AI models.

So here, Suleyman's statement is to defend it as perfectly legal.

Previously, Google, which is none other than Suleyman's employer, once stated that the company has the right to "collect" information from the web to train its AI, as if the whole internet is the company's own playground.

The same goes for OpenAI, Microsoft and others.

Another rival, Perplexity, also allegedly disobeys the robots.txt protocol, in order to scrape websites even when it's forbidden for it to do so.

Suleyman’s remarks here, suggest that AI developers like Microsoft and others, can freely use the vast amount of data available online to train their models.

When AI companies use this kind of content to train their models without permission, they’re taking value away from them without compensating the original creators. The interviewer compared this to an author referencing other books while writing their own. While the author doesn’t pay the referenced authors, they still need to buy the books.

Suleyman here, tries to hold his ground, which stands at a grey area between the complex legal and ethical issues surrounding content ownership and usage rights.

Fair use does allow limited use of copyrighted material for purposes like criticism, teaching, or research. However, using vast amounts of content to develop AI models goes beyond these boundaries, especially when there are clear commercial motives involved.

The issue here stems from the fact that the internet is full of content created by writers, journalists, artists, and many others who rely on making money from their work.