WordPress And Tumblr Want To Sell Users' Data To Help Train OpenAI and Midjourney

WordPress, Tumblr, robot hand

Two blogging titans are not clashing or competing. Instead, they're caught in a crossroad.

WordPress.com is a freemium blogging service, which allows people to build blogs and self-publish posts. Tumblr on the other hand, is a microblogging and social networking website. The two are preparing to sell their data to AI companies, according to a source with internal knowledge about the deals and internal documentation referring to the deals.

According to a report from 404 Media, the source with internal knowledge said that both platforms are partnering with OpenAI and Midjourney.

The two platforms shall receive compensation in return of delivering constant stream of information to feed the two companies' AI products.

First of, the deal between the AI companies and WordPress is eminent.

Automattic, the parent company of WordPress, has started compiling user data, and plans to launch a new setting to allow users to opt-out of data sharing with third parties, including AI companies, according to the source.

A dedicated FAQ section titled "What happens when you opt out?" states that "If you opt out from the start, we will block crawlers from accessing your content by adding your site on a disallowed list. If you change your mind later, we also plan to update any partners about people who newly opt-out and ask that their content be removed from past sources and future training."

In order to have the AI companies only take what they have, Automattic promises that it has blocked AI crawlers from scraping its sites.

"We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control. Our partnerships will respect all opt-out settings. We also plan to take that a step further and regularly update any partners about people who newly opt out and ask that their content be removed from past sources and future training."

The statement published by Automattic specifically mentioned WordPress.com, which are blogs that Automattic hosts as a service, and not the open-source WordPress content management system at WordPress.org that people and businesses use on self-hosted websites.

"AI is rapidly transforming nearly every aspect of our world, including the way we create and consume content. At Automattic, we’ve always believed in a free and open web and individual choice. Like other tech companies, we’re closely following these advancements, including how to work with AI companies in a way that respects our users’ preferences," said Automattic in a statement.

And as for the deal with Tumblr, things are initially controversial.

For starters, the data Tumblr is compiling for the AI companies. The data that is included:

  • Private posts on public blogs.
  • Posts on deleted or suspended blogs.
  • Unanswered asks that are typically not public until they’re answered.
  • Private answers that only show up to the receiver and are not public.
  • Posts that are marked ‘explicit’/NSFW/‘mature’.
  • Content from premium partner blogs.
AI bias

And the data that shouldn't be included:

  • Password-protected posts.
  • DMs.
  • Media flagged as CSAM and other community guidelines violations.

Companies like OpenAI and Midjourney require humongous datasets to train their AI systems.

Generative AI products that use Large Language Models, like ChatGPT and Midjourney can only do what they are designed to do by consuming enormous amounts of information.

The more the data, the better the information, the smarter the AI products should become.

Churning data is the way these AIs learn to do the things they do.

And here, both WordPress and Tumblr are caught in a crossroad, where they either must stay firm to their beliefs, or play along and surf the trend.

The two chose the latter.

It's worth noting though, that Automattic also owns Tumblr.

Published: 
29/02/2024