Anonymized Data Cannot Protect User Privacy, Researchers Said

In the world where data is transacted all the time, the cost includes unveiling user privacy.

This is regarded as a necessity in living in the modern connected ecosystem. From searching the web, using social media networks, to routine visit to the doctor, people are handing over an increasing amount of personal data to whoever they are interacting with.

From a privacy point of view, this is concerning.

With the huge amount of data people are generating, the flow of information can go from individuals to companies, to other companies and even third-parties that shouldn't be involved in the first place. User data has become a commodity, and this is happening all the time.

Privacy concerned individuals who are concerned about this fact, may grow anxious. There is little that can be done to stop the flow, as the process is crucial to the ecosystem.

An illustration of de-anonymization using only several data points

One of the ways to keep those transacted personal data private, companies have made some anonymization of data. This approach of information sanitization is to protect privacy, by either encrypting or removing personally identifiable information from data sets.

This way, people whom the data describe should remain anonymous.

And by storing this kind of anonymized information inside massive databases, it should be nearly impossible for anyone to trace back the data to a single human.

But according to researchers, individuals in a sample database can be re-identified 83 percent of the time using just three data points: zip code, gender, date of birth.

Using more data points, like by including marital status, age of occupation, the number can go up closer to 100 percent.

In other words, despite the data is anonymized, data points can still pinpoint the data to the owner.

While the needed data points to target certain individuals can certainly be huge for a single company to gather, the fact when it comes to the internet, Facebook, Google, and Amazon alone have hundreds, perhaps thousands of data points to pull from.

"We might share anonymous data with third parties". Users of the web should have encountered this sentence in online Terms of Service. Agreeing to this simple statement means that users allow the service to use and share whatever users give them.

This is why it's suggested that users should always take care of their privacy by limiting the information they give to any web services.

These companies can collect (anonymized) data from search history, the ads users clicked, purchases they've made, and so forth.

When considering the three companies as an example, they don't even need users to ever give them data. With algorithms, the companies can simply make accurate guesses from anonymized data points to track the data to anyone in their database.

Yes, user data has become a commodity, especially when dealing with free services and apps. Companies behind them are tracking anything to understand users' intention, habits and more.

For good reasons, they do this to improve experience (and their revenue).

And what they aren’t tracking, they’re buying. Data brokers are big business, and exist solely to provide competitive insights into everything users are.

In bad cases, these companies can use the what they've collected against the owners.

And here, anonymization is better than no anonymization at all. But when it comes to privacy, it isn't helping much.

Further reading: Ways To Protect Your Privacy On The Internet: Your Personal Information Is A Commodity