AI is Racist
Addressing Bias in Artificial Intelligence

Artificial Intelligence (AI) has become an integral part of our lives, permeating various sectors and influencing our daily interactions. However, beneath the sheen of technological advancement, a critical concern looms large — AI’s inadvertent perpetuation of racial bias.
The roots of this issue trace back to the vast datasets on which large language models are trained. Encompassing trillions of words, these datasets predominantly reflect the perspectives of English speakers, particularly those of white ethnicity. Books and articles, the primary sources for these datasets, are shown to have a significant bias, with 75% to 95% of them authored by individuals from white backgrounds. This creates a statistical bias favouring whiteness in generative AI platforms.
While it might be argued that the words themselves aren’t inherently racist, the absence of diversity in the training data casts a shadow over the resulting AI models. The privilege embedded in these datasets is palpable, and the AI, in its quest for efficiency through neural net probability calculations, tends to generate responses that lean towards racist generalisations.
The issue extends beyond language models to AI systems that process visual information. The imbalance in data collection is starkly evident in image databases used for computer vision. For instance, 45% of the most used image database in computer vision originates from the United States, leaving a mere 3% for China and India combined, representing 36% of the world’s population. This skewed representation leads to cultural and ethical biases, with algorithms mislabeling images based on their limited training data.
One poignant example comes from facial recognition systems, where MIT researcher Joy Buolamwini uncovered a disturbing bias. While these systems exhibited high accuracy in classifying the gender of white individuals, the accuracy plummeted as the skin shades darkened, particularly for dark-skinned females.
The bias isn’t confined to image classification; it extends to Natural Language Processing (NLP). Word embeddings, a common technique in NLP, can inadvertently encode gender stereotypes. Research has demonstrated that analogies derived from models trained on Google News articles perpetuate gender biases, linking professions like ‘doctor’ to ‘man’ and ‘nurse’ to ‘woman.’
Addressing the ethical implications of biased AI requires a multi-faceted approach. Scientists and engineers must confront the imbalance in training sets, actively working to diversify datasets and eliminate racial and gender biases. Users and non-experts, on the other hand, need to comprehend that AI, rooted in complex mathematics, operates as a ‘black box.’ Despite efforts to decipher its intermediate outputs, the inherent complexity of neural networks often leads to unintended biases.
In unravelling the shadows of bias within AI, it is imperative to acknowledge this ethical dilemma and collectively strive for transparency, inclusivity, and fairness in the development and deployment of artificial intelligence.






