avatarMichael Koetsier

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5261

Abstract

ries that were published in the past in magazines and journals. It also includes highly biased books such as <i>The Sword and the Distaff</i>, <i>The Protocols of the Elders of Zion</i>, <i>Mein Kampf, </i>and <i>The Turner Diaries</i>. It includes debunked racist science such as <i>phrenology</i>, the study of brain sizes of people, <i>polygeny</i>, the idea that different races had been created separately, and <i>prognathism</i>, the theory that people of African descent were more closely related to apes; to name some of the most objectionable examples.</p><p id="8ee1">In the US, we have a particularly checkered history. We needed to justify institutions such as the displacement of Native Americans onto reservations, Jim Crow laws in the South, and Japanese concentration camps in WWII. More recently we needed to justify programs such as CoIntelPro, the widespread use of crack in black neighborhoods, the continued incarceration of prisoners of conscience like Leonard Peltier, the higher number of African Americans on death row, and the high mortality rate of Latin American women in pregnancy. All of these injustices are written about, and the majority of that writing supports the continued injustice.</p><p id="4e7e">While there was also anti-racist writing throughout our history, it does not represent a balance to the racist writing. This is because we know that racist institutions such as slavery existed, that is simply a fact. This implies that most of the people, and therefore also a majority of the writing, supported it. This same justification exists for all the other racist institutions and events listed above.</p><p id="b091">Hence, a huge part of the writing on these topics is racist. ChatGPT pulls from this fundamentally racist data. These are all repulsive text sources, but they are part of the sum total of all the texts that exist online. The fact is, ChatGPT is returning to us, our own ugly past; it is holding up a mirror to our racism.</p><p id="1829">So ChatGPT is not in itself racist, but the technology pulls from racist data: garbage in, garbage out, as they say in computer science.</p><h2 id="f2e2">How OpenAI, Inc., Has Responded</h2><p id="92ba">When Microsoft first introduced an <a href="https://en.wikipedia.org/wiki/Tay_(chatbot)">AI Chat application called Tay</a>, people flocked to use it. It did not take long for users to find flaws in the algorithm, though. Soon enough Tay started to respond with downright racist responses and very offensive language. As a result, just 16 hours after Tay was launched, it was terminated.</p><p id="d99a">What Tay did reveal in its short life online, was that the text data, very similar to the data that ChatGPT still uses, is quite racist. Unlike Tay, though, OpenAI, Inc., can’t pull the plug on ChatGPT. What they did, was a bit cleverer.</p><p id="86f2">Tech companies often deal with this type of problem one way: they pull the plug. When confronted with similar complaints, Open AI, Inc., decided to hide the problem instead. Now, they simply prevent offensive responses from being displayed. The software still finds the answers but refuses to display them.</p><p id="8f47">Technically, ChatGPT is censoring the output, or blurring the mirror, if you will. However, OpenAI, Inc., is not solving the problem of biased data. For example, we don’t know whether companies that license the technology from ChatGPT also sensor the output. Are these companies even given the option by OpenAI, Inc. to censor it?</p><p id="d932">We can’t know for sure, so how do we confirm the bias in the data now that it is hidden from us? Is there a workaround? For that, we must look at other related projects that also rely on historical data.</p><h2 id="db51">Racist Results in Images</h2><p id="b9ab">One of the interesting applications of this technology is in the generation of images. OpenAI, Inc., also operates a second project called DALL-E. It also pulls from the internet, but instead of searching through text, it searches through images with associated text descriptions. Instead of producing text from a prompt, it produces images. The technology is also used by other graphics companies as well as facial recognition companies. In short, it is also hugely popular online.</p><p id="4c50">Researchers found, after using the technology to produce images, that it would typically return stereotypical attributes for these images. So, for example, a prompt to generate the image of a drug dealer would produce a dark-complected individual most of the time. Likewise, a prompt to generate a professor would typically generate an image of an older white male. As a matter of fact, it was found that <a href="https://en.wikipedia.org/wiki/DALL-E#Ethical_concerns">the algorithm inserts terms like “black man” into prompts even when the user does not specify this</a>.</p><p id="bd20">This wasn’t a one-off discovery. The following video, <i>How to make computers less biased</i>, does a very good job of describing several common examples in the UK of how image software that pulls from internet sources is inherently biased because the data that it pulls from is also biased:</p> <figure id="e7a5"> <div> <div> <

Options

img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FIzvgEs1wPFQ%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DIzvgEs1wPFQ&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FIzvgEs1wPFQ%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854"> </div> </div> </figure></iframe></div></div></figure><p id="6038">Likewise, a renowned researcher in this field, Dr. Joy Buolamwini, of the MIT Media Lab, was interested in finding how facial analysis technology could be biased against people of color. After looking at many different applications that used facial recognition she came to the following conclusion:</p><blockquote id="fff6"><p>When I started looking at face datasets used in the development of facial analysis technology, I found that in some cases they contained 75% male faces and over 80% lighter faces. For these systems, data is destiny, and if the data is largely pale and male, AI trained on skewed data is destined to fail the rest of society — the undersampled majority — women and people of colour.</p></blockquote><p id="bcbc">- <a href="https://artsandculture.google.com/story/joy-buolamwini-examining-racial-and-gender-bias-in-facial-analysis-software-barbican-centre/BQWBaNKAVWQPJg?hl=en">Joy Buolamwini: examining racial and gender bias in facial analysis software</a>.</p><p id="ba07">She also discovered that when the technology was used to drive driverless cars, the inability to recognize people with darker skin tones could have fatal consequences. Here is a video of her exhibit in London where she describes some of her findings, including her research into driverless car technology (at 1:15 in the clip):</p> <figure id="d5e6"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FBDDGhNHtr-c%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DBDDGhNHtr-c&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FBDDGhNHtr-c%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854"> </div> </div> </figure></iframe></div></div></figure><p id="dd25">The problem was that the datasets that were used to develop the driverless car technology, were majority men and lighter-skinned individuals: garbage in, garbage out.</p><p id="31fc">Granted, these are all examples of images and not words. That is true, but software applications like DALL-E and similar facial recognition applications pull data from the internet, in the same way that ChatGPT does. Images, like text, are biased because our past is biased.</p><h2 id="e424">Where To Go From Here</h2><p id="a8db">We can’t stop the popular growth of ChatGPT, nor all the derived applications that use the same AI technology. Once a revolutionary technology is made available, it is virtually impossible to retract it. The cat is out of the bag. Even if we could remove ChatGPT and DALL-E, finding and removing all the new data that they have generated would be incredibly difficult.</p><p id="4b00">However, if the data that ChatGPT depends on is inherently racist, can we rely on it as accurate? No, we can’t. We need to find ways to check the output and, when necessary, to correct the generated output. Because of the vast amount of data, companies like Open AI, Inc., use computer algorithms to hide biases. This really is a poor band-aid, because it hides the problem instead of finding solutions to correct it. In essence, the company no longer has an incentive to correct the problem.</p><p id="7f24">Another issue is that the current trend is to ask companies like Expedia, Coca-Cola, and Microsoft to correct the problem and regulate themselves. Historically, having companies regulate themselves has not been very effective because profitability often takes precedence over preventing harm. It is not cost-effective to correct the problem, or it’s something they would rather have someone else pay for.</p><p id="3f3d">Obviously, this is a complex problem with no quick answers. For now, the problem is hidden from us, unless we look at it from a different angle (as we can with images). Then the problem is quite obviously still there. We can’t assume that text data is not racist when we know that racism was present in much of our history.</p><p id="2e4c">So, no, ChatGPT is not racist, but the underlying data undoubtedly is. For now, ChatGPT is hiding the truth from its users. It’s a band-aid, but it’s what we have. It is also not clear whether the real problem is being addressed. After all, the need to address it is not urgent anymore.</p><p id="cb17">If you liked this story and would like to know when I publish more, then consider clicking on the little envelope icon under my picture.</p></article></body>

“Don’t use ChatGPT, It’s Racist!” Let’s Talk About That For a Minute

A layman’s explanation of why ChatGPT came under fire for racism.

Photo by Sanket Mishra on Pexels

Recently, our son’s English teacher felt it was necessary to address the use of ChatGPT in the class. It was obvious from the discussion that there were a lot of misunderstandings about the technology. That ChatGPT is now worthy of a whole class lecture and discussion is quite revealing too.

The fact is there is a lot of chatter about ChatGPT and one of the things we hear often is that it’s racist. Where does this come from? Is it a problem? And if it is, how is it being addressed?

So, let’s discuss this without academic jargon or tech acronyms. We need to have a discussion about this issue in a way that everyone can understand. Let’s start with ChatGPT. It’s all over the news, but few people know what it is or what it does. Let’s start there.

What is ChatGPT?

(skip this part if you already know this)

Let’s first get the name out of the way. ChatGPT is an acronym that stands for Chat Generative Pre-trained Transformer. These are big words that mean it will write things for you so that you don’t have to do it yourself. It’s extremely easy and convenient. It is also growing in popularity by leaps and bounds, but many people feel this threatens existing activities that rely on writing, like social media, book writing, and computer coding.

ChatGPT was largely developed by a non-profit called OpenAI, Inc. Actually, it was a joint project between OpenAI Inc. and their commercial subsidiary OpenAI, L.P. This duality is a small detail, but it matters because there is a lot of debate about financial interests and any possible liabilities involved in the technology. To put it simply, it is not clear what the motivation is behind the products, who is responsible for it, and where it is headed.

ChatGPT uses Artificial Intelligence (AI) algorithms to improve accuracy. Artificial Intelligence is adaptive. This means that it continually tries to improve. So, it takes input continually and the responses it provides are an aggregated sum of the inputs. Technically, a response generated this year will be slightly different next year because by then it will have more input to pull from. This input, or data, is at the crux of the issue, but let’s first look at how ChatGPT works.

To use ChatGPT, you go to a website, type in a prompt such as “Write me a short description of photosynthesis” and it will then write a few paragraphs for you about this topic. You can then copy and paste this text into your word processor, your social media, or your emails. Unlike before, you no longer need to look the info up on Wikipedia, put it into your own words, and write it from scratch.

Prompts can become very specific. So, you can specify that you want to know about a very specific sub-process of photosynthesis, you need it written in rhyming prose, and it needs to be less than 500 characters. ChapGPT will do that, exactly as prompted. As before, just copy and paste, and you’re done.

The application has become so popular that many companies have licensed the technology and integrated it into their own products. For example, Expedia uses it to respond to travel inquiries on its website, and Coca-Cola uses it to improve its marketing strategy. Perhaps the most well-known example is that Microsoft uses ChatGPT in their services, and as a processing engine behind their Azure OpenAI.

It isn’t exactly clear to most people how Microsoft has implemented this, and that may be by design. Microsoft would rather that folks just use their Bing search engine like they do Google, and not realize that the data actually pulls from the same data sources that ChatGPT does.

Because of Microsoft’s close collaboration with ChatGPT, it is likely that in the very near future, you will use it in their Word processor to auto-generate text or in their Excel spreadsheet to generate projections, or in their Teams chat program to automate discussions with your colleagues (and possibly other ChatGPT bots).

So, What’s the Race Problem?

The problem is where ChatGPT pulls from every time it is prompted. It uses widely available sources like Wikipedia, search engine results, blogs, magazines, research journals, libraries, museums, statistical databases, and pretty much any publicly and privately held available sources. The dataset is the sum total of all the text that exists online. Open AI, Inc., has been scouring the internet for the past 12 years to find as much data as possible.

However, the data also includes propaganda, social beliefs, and downright racist theories that were published in the past in magazines and journals. It also includes highly biased books such as The Sword and the Distaff, The Protocols of the Elders of Zion, Mein Kampf, and The Turner Diaries. It includes debunked racist science such as phrenology, the study of brain sizes of people, polygeny, the idea that different races had been created separately, and prognathism, the theory that people of African descent were more closely related to apes; to name some of the most objectionable examples.

In the US, we have a particularly checkered history. We needed to justify institutions such as the displacement of Native Americans onto reservations, Jim Crow laws in the South, and Japanese concentration camps in WWII. More recently we needed to justify programs such as CoIntelPro, the widespread use of crack in black neighborhoods, the continued incarceration of prisoners of conscience like Leonard Peltier, the higher number of African Americans on death row, and the high mortality rate of Latin American women in pregnancy. All of these injustices are written about, and the majority of that writing supports the continued injustice.

While there was also anti-racist writing throughout our history, it does not represent a balance to the racist writing. This is because we know that racist institutions such as slavery existed, that is simply a fact. This implies that most of the people, and therefore also a majority of the writing, supported it. This same justification exists for all the other racist institutions and events listed above.

Hence, a huge part of the writing on these topics is racist. ChatGPT pulls from this fundamentally racist data. These are all repulsive text sources, but they are part of the sum total of all the texts that exist online. The fact is, ChatGPT is returning to us, our own ugly past; it is holding up a mirror to our racism.

So ChatGPT is not in itself racist, but the technology pulls from racist data: garbage in, garbage out, as they say in computer science.

How OpenAI, Inc., Has Responded

When Microsoft first introduced an AI Chat application called Tay, people flocked to use it. It did not take long for users to find flaws in the algorithm, though. Soon enough Tay started to respond with downright racist responses and very offensive language. As a result, just 16 hours after Tay was launched, it was terminated.

What Tay did reveal in its short life online, was that the text data, very similar to the data that ChatGPT still uses, is quite racist. Unlike Tay, though, OpenAI, Inc., can’t pull the plug on ChatGPT. What they did, was a bit cleverer.

Tech companies often deal with this type of problem one way: they pull the plug. When confronted with similar complaints, Open AI, Inc., decided to hide the problem instead. Now, they simply prevent offensive responses from being displayed. The software still finds the answers but refuses to display them.

Technically, ChatGPT is censoring the output, or blurring the mirror, if you will. However, OpenAI, Inc., is not solving the problem of biased data. For example, we don’t know whether companies that license the technology from ChatGPT also sensor the output. Are these companies even given the option by OpenAI, Inc. to censor it?

We can’t know for sure, so how do we confirm the bias in the data now that it is hidden from us? Is there a workaround? For that, we must look at other related projects that also rely on historical data.

Racist Results in Images

One of the interesting applications of this technology is in the generation of images. OpenAI, Inc., also operates a second project called DALL-E. It also pulls from the internet, but instead of searching through text, it searches through images with associated text descriptions. Instead of producing text from a prompt, it produces images. The technology is also used by other graphics companies as well as facial recognition companies. In short, it is also hugely popular online.

Researchers found, after using the technology to produce images, that it would typically return stereotypical attributes for these images. So, for example, a prompt to generate the image of a drug dealer would produce a dark-complected individual most of the time. Likewise, a prompt to generate a professor would typically generate an image of an older white male. As a matter of fact, it was found that the algorithm inserts terms like “black man” into prompts even when the user does not specify this.

This wasn’t a one-off discovery. The following video, How to make computers less biased, does a very good job of describing several common examples in the UK of how image software that pulls from internet sources is inherently biased because the data that it pulls from is also biased:

Likewise, a renowned researcher in this field, Dr. Joy Buolamwini, of the MIT Media Lab, was interested in finding how facial analysis technology could be biased against people of color. After looking at many different applications that used facial recognition she came to the following conclusion:

When I started looking at face datasets used in the development of facial analysis technology, I found that in some cases they contained 75% male faces and over 80% lighter faces. For these systems, data is destiny, and if the data is largely pale and male, AI trained on skewed data is destined to fail the rest of society — the undersampled majority — women and people of colour.

- Joy Buolamwini: examining racial and gender bias in facial analysis software.

She also discovered that when the technology was used to drive driverless cars, the inability to recognize people with darker skin tones could have fatal consequences. Here is a video of her exhibit in London where she describes some of her findings, including her research into driverless car technology (at 1:15 in the clip):

The problem was that the datasets that were used to develop the driverless car technology, were majority men and lighter-skinned individuals: garbage in, garbage out.

Granted, these are all examples of images and not words. That is true, but software applications like DALL-E and similar facial recognition applications pull data from the internet, in the same way that ChatGPT does. Images, like text, are biased because our past is biased.

Where To Go From Here

We can’t stop the popular growth of ChatGPT, nor all the derived applications that use the same AI technology. Once a revolutionary technology is made available, it is virtually impossible to retract it. The cat is out of the bag. Even if we could remove ChatGPT and DALL-E, finding and removing all the new data that they have generated would be incredibly difficult.

However, if the data that ChatGPT depends on is inherently racist, can we rely on it as accurate? No, we can’t. We need to find ways to check the output and, when necessary, to correct the generated output. Because of the vast amount of data, companies like Open AI, Inc., use computer algorithms to hide biases. This really is a poor band-aid, because it hides the problem instead of finding solutions to correct it. In essence, the company no longer has an incentive to correct the problem.

Another issue is that the current trend is to ask companies like Expedia, Coca-Cola, and Microsoft to correct the problem and regulate themselves. Historically, having companies regulate themselves has not been very effective because profitability often takes precedence over preventing harm. It is not cost-effective to correct the problem, or it’s something they would rather have someone else pay for.

Obviously, this is a complex problem with no quick answers. For now, the problem is hidden from us, unless we look at it from a different angle (as we can with images). Then the problem is quite obviously still there. We can’t assume that text data is not racist when we know that racism was present in much of our history.

So, no, ChatGPT is not racist, but the underlying data undoubtedly is. For now, ChatGPT is hiding the truth from its users. It’s a band-aid, but it’s what we have. It is also not clear whether the real problem is being addressed. After all, the need to address it is not urgent anymore.

If you liked this story and would like to know when I publish more, then consider clicking on the little envelope icon under my picture.

ChatGPT
Artificial Intelligence
Writing
Education
History
Recommended from ReadMedium