Using YouTube Videos To Train AI Tools Without Permission Would Be A 'Clear Violation'

Neal Mohan
CEO of YouTube

AI models using individual's work without permission (or compensation) is nothing new. And this has been a topic of debate and concern for so many years.

And with the rise of AI technology, particularly large language models like, there have been heated discussions about the ethical implications of using text generated by AI based on copyrighted material or proprietary content.

One of the main concerns is the potential for AI-generated content to infringe upon intellectual property rights, such as copyright, trademarks, or patents.

For instance, if an AI model generates text that closely resembles a copyrighted work without permission, it could raise legal questions about ownership and usage rights. Furthermore, there's the issue of fair compensation for creators.

This time, In an interview with Bloomberg Originals host Emily Chang, the YouTube CEO Neal Mohan warns OpenAI that training models on its videos is against the rules.

Neal Mohan
Neal Mohan.

Mohan said that OpenAI in using its videos to teach Sora, its text-to-video AI tool, would be a "clear violation" of the platform's terms of use.
:

"From a creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations."

"One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service."

"Those are the rules of the road in terms of content on our platform."

The warning comes after OpenAI couldn't confirm if Sora learned from the platform.

At the time, OpenAI CTO Mira Murati contributed to the uncertainty, telling that she wasn't sure if Sora takes data from YouTube, Instagram or Facebook posts.

The ChatGPT-maker admitted that it had used copyrighted data to train its AI models, saying it was “impossible” to build the technology without it.

And here, Mohan was referring to longstanding questions about where AI companies get the content they use to train the model that power their services. While Mohan was sure to say he didn’t know whether OpenAI’s had used YouTube content to develop Sora, he said that would be a problem, if true.

Downloading videos or transcripts would be an infringement on terms.

“Those are the rules of the road in terms of content on our platform,” Mohan said.

But here comes the catch.

YouTube is only making things clear that third-parties are prohibited from scraping its data, and that it didn't say anything about Google.

YouTube and Google’s parent company, Alphabet, is developing its own suite of AI tools, making it likely that Alphabet is even more concerned a potential rival might be using its content in a way that violates its terms of service.

Google may want YouTube's data for its own models.

Since the AI arms race is a gold rush for companies with data, big players want to make sure that rivals don’t take the data they’ve accumulated.

So it's certain that Alphabet may put up walled gardens as terms and conditions, just to protect YouTube from being utilized by competitors.

It's worth noting though, that Mohan added Google does use some YouTube videos to train its own AI platform, Gemini, but only if the individual creators on the platform agreed to that in their contracts.