Human-Sourced Biases That Would Trouble The Advancements Of Artificial Intelligence

With people creating and generating more information, we can certainly conclude researchers have sufficient training materials for AIs to be smart. But there is somehow one problem that seems to never go away.

And that is bias.

Bias in AI is a common thing because data are collected, processed and analyzed differently. For example, words with multiple meanings, can cause machines to misinterpret commands. Bias can also come from humans who delivered those materials to computers, as well as from the algorithms themselves.

Computers can be made smarter with AI. This machine-learning capabilities can provide computers the ability to "think" and "work" beyond what has been coded into them. The good thing about computers is that algorithms do precisely what they have been taught. However, they are only as good as their mathematical construction and the data they are trained on.

Human code blinded

Bias here will make the computers reflect that bias into what it is supposed to do. And in terms of effectiveness and reliability of AIs, this is certainly a bad thing.

To the extent that we humans build algorithms and train them, human-sourced biases will inevitably creep into AI models. But fortunately, the biases are well understood. It can be be detected and mitigated, but we need to know thet wherever they are present.

There are four distinct types of machine learning biases:

Sample Bias

Sample bias comes from the data itself. When training AI models, we need to accurately provide the models with data that will represent the environment the models will operate in. Because AIs can only be as smart as the data they have been trained with, there is no way for the algorithms to be trained on the entire universe of data it could interact with.

For this reason, researchers usually do sampling techniques.

But if for example, we want to train algorithms to properly operate autonomous cars during the day and night, introducing the AI models with daytime training data would introduce sample bias. For this reason, researchers need to train the algorithms with both daytime and nighttime datasets to eliminate any biases.

Prejudice Bias

Anything that is influenced by culture or stereotypes would result in prejudice bias.

For example, in many countries, working is more identical with men having to financially support his wife, while the wife stays at home doing the households as a housewife. She would probably be pictured inside a kitchen, or taking care of the children.

When a computer vision model is being trained to understand people at work, exposing the algorithms with thousands of training images using this stereotype will make the algorithms conclude that working is for men and women are homemakers.

In many cases, this is a prejudice bias, because after all, women can also work and men can also cook. As a matter of fact, most famous chefs are men, for example.

The issue here is that training data decisions consciously or unconsciously reflected social stereotypes. This could have been avoided by ignoring the statistical relationship between genders and occupations, and exposing the algorithm to a more even-handed distribution of examples.

Decisions like these require a sensitivity to stereotypes and prejudice, and it’s up to humans who feed the algorithms to anticipate the behavior the model is supposed to express.

Since mathematics can’t solve prejudice, humans tasked with labeling and annotating training data, should make sure that they won't introduce their own societal prejudices or stereotypes into the AI model's training data.

Measurement Bias

This happens because of distortion or issues. This kind of bias can skew the data in particular direction, making the training model biased. For example, when an AI model is trained using data images that use chromatic filter would eventually distort the color in every image.

This is because the algorithms were trained on image data that systematically failed to represent the environment it will operate in.

Measurement biases can’t be avoided by simply by collecting more data. To avoid this from happening, it's best for the training data to have the least amount of measuring devices. Humans that are tasked to train the AI models should also compare the output of these devices to make sure that biases are not present.

Bias

Algorithm Bias

The fourth type of bias, has nothing to do with data. This type of bias, the algorithm bias, can be described as a result of an overloaded model due to the mathematical property.

For example, models with high variance can be fed with a more diverse training data. However, they can be sensitive to noise. On the other hand, AI models with high bias are more rigid, less sensitive to variations in data and noise, but are more prone to missing complexities.

For this reason, there should be an appropriate balance between these two properties.

Understanding these four types of AI biases should allow researchers to build better AI models and better training data.

AI algorithms are after all built by humans. Their training data are assembled, cleaned, labeled and annotated also by humans. Data scientists need to be aware of these biases and know how to avoid them through a consistent, iterative approach, continuously testing the model, and by bringing in well-trained humans for assistant.