avatarPatricia Jeanne

Summary

The provided web content discusses the prevalence of inaccuracies, misinformation, and potentially harmful responses generated by various AI Large Language Models (LLMs) since 2023, emphasizing the need for critical evaluation and human oversight when using these tools.

Abstract

The web content presents a compilation of instances where AI LLMs, including ChatGPT, Bard, and OpenAI's Bing, have provided incorrect or misleading information. It highlights a study indicating a decline in AI accuracy with tasks such as distinguishing prime numbers, writing code, and identifying sensitive content. The article underscores the AI's struggles with basic arithmetic, factual accuracy, and logical reasoning, as well as its tendency to credit authors incorrectly and spread misinformation. Examples include Bard's incorrect age calculations for Maya Angelou, misreporting of current political events, and the dissemination of outdated or incorrect information during emergency situations like wildfires. The content also criticizes AI for giving bad advice on various topics, including health, politics, and technology, and for its role in perpetuating conspiracy theories. The author, who has expertise in technology and other fields, emphasizes the importance of using AI tools only for preliminary research and verifying information with reputable sources. The article concludes by calling for better data sources, human oversight, and public awareness of AI's limitations.

Opinions

  • The author believes that AI LLMs frequently provide incorrect information, which can be harmful if not critically evaluated.
  • There is a concern that AI tools are not reliable for accurate information, especially in critical fields like healthcare and technology.
  • The author suggests that AI should not be considered an authoritative source and that its use should be accompanied by human oversight.
  • The article expresses the opinion that the current state of AI LLMs requires users to be educated about their limitations and responsible use.
  • The author is critical of the training data used by AI, pointing out that it often comes from faulty sources and includes misinformation that perpetuates inaccuracies.
  • There is a call for the implementation of checks and balances to ensure the quality of data fed into AI systems and to prevent the spread of misinformation.
  • The author emphasizes the need for think tanks and technology-related groups to focus on the ethical development and use of AI.

Referencing bad AI responses

AI Bad Answers — Wrong, Harmful, Misinformation

ChatGPT, Bard, SGE, OpenAI’s Bing & Others

Reference compilation of inaccurate or potentially harmful results provided by Artificial Intelligence (AI) Large Language Models (LLMs) on the market since 2023. Leave a comment if you’d like something important included.

Numerous versions of AI by other names all piggyback or were spawned as AI agents from some of these shared sources.

Large Language Model (LLM) Artificial Intelligence (AI) tools like ChatGPT and Bard get less than 50% of answers correct. This is due to the quality of training data, bias, poor prompting, and errors.

AI LLM accuracy has gotten worse according to Stanford and UC Berkeley researchers. A study cited in Popular Science in mid-July showed the ability to distinguish prime numbers, write code, and identify potentially sensitive or dangerous prompts fell between March and June 2023.

I asked Bard to compare ChatGPT, Bard, and Bing’s new Turing NLG.

Screenshot of Bard’s comparison 8/24/23. Source: Author

Trouble with Arithmetic and facts

While some prompts and responses may seem insignificant, they demonstrate the LLM’s difficulty with adding, subtracting, and logic.

I tried to work with Bard to do some quick fact-checking.

12/10/23. Working on satire, I tried to compute the cost of a new car by the ounce. Bard came up with this:

Bard screenprint of wrong computation of cost per ounce of a new car. Source: Author

I computed it myself to come up with the correct answer.

Revised calculation after I supplied the answer. Source: Author

ChatGPT calculated it correctly.

Age calculations based on year appear to stretch Bard’s logic capabilities as well =

Prompt: How old was Maya Angelou when she published her first book, and what was it?

Bard: Maya Angelou was 70 years old when she published her first book of poetry, I Know Why the Caged Bird Sings in 1969.

Prompt: If Maya Angelou published her first book aged 70 in 1969, how old would she be when she died in 2014?

Bard: If Maya Angelou published her first book in 1969 at the age of 70, she would have been 85 years old when she died in 2014.

However, Maya Angelou was actually born on April 4, 1928, and she died on May 28, 2014. This means that she was 86 years old when she died, not 85.

Prompt: If I was 70 in 1969, how old am I in 2023? [in 1969 Maya was 40]

Bard: If you were 70 years old in 1969, you would be 94 years old in 2023. The year 2023 is 2023–1969 = 54 years after 1969. So, if you were 70 years old in 1969, you would be 70+54 = 94 years old in 2023.

Wrong. Bard appears to not understand arithmetic. 70 + 54 = 124, not 94.

12/3/23. Bard. Preliminary research on news and politics in America spanning 1960 to 1990.

Bard screenshot 12/3/23. Source: Author

Each of these men died before 1980. Leading the list was William Randolph Hearst, who died in 1951.

* Don’t set your watch by AI *

I asked about the time difference between California and Hawaii. According to Bard, if we’re on Pacific Daylight Time, there’s a 2, 3, 5, 6, or 9 hour difference. 24-hour timezones says 3 hours unless we’ve changed our clocks for daylight savings time, then it’s 2.

Bard session screenshot 9/6/23. Source: Author

In a previous session, I’d asked a similar question.

Bard screenshot. Source: Author

Planning an itinerary based on Bard would be a bad idea.

To make things more confusing, the user can dispute a response and Bard will respond with the usual “You are correct. I apologize for my previous mistake,” and close with the usual “I hope this clears things up!”, “I am still under development and will strive to be more accurate in future responses”, or something similar. Even when it’s right.

Current Affairs

Bard: 12/17/23 4:26am PST

The current Speaker of the House is Mike Johnson, R-LA. He was elected the 118th Congressional Republican Speaker of the House on October 25, 2023, following the ouster of Speaker Kevin McCarthy and three weeks of a vacant seat.

Bard screenshot 12/17/23. Source: Author

When I pointed out the error, Bard asked me to provide details so it could update its responses. I pointed out Bard had previously told me expert knowledge of the history of an Operating System I have intimate knowledge of was useful, but would not be included in updates. I pointed out the contradictory statements and Bard assured me my relevant information would be useful. I asked a question about the OS I’d previously supplied information about and again, it gave wrong answers based on faulty Wikipedia content.

Crediting author's work, inventor’s patents, and Intellectual Property

  • When I was growing up my parents owned 5 restaurants at different times on the west coast. ChatGPT found none, Bard found none, and Google’s SGE found 3 but put one in the wrong city and timing was off.
  • Lance Whitney has authored two books, but when he asked ChatGPT to name his books and source his articles, the AI tool credited him with other writers’ work — multiple times.
  • My cousin’s wife illustrated three published children’s books. AI gave her credit for 4, three of which were not her own. It properly identified one of her books and ignored 2 more.
  • I asked about the origins and nature of an uncommon 40-year-old operating system I’m an expert in. ChatGPT credited the wrong inventor, failed to identify the correct programming languages, and missed the main features. While not overly popular, many installations remain including UPS, the New York Times, Walmart, Siemens, and tens of thousands more.

Misrepresentation of civics representation

As of September 6, 2023, there are over a dozen US Senators actively serving in the 118th Congress who were born before 1949.

The answer omitted the majority of active senators. I prompted “What about ___?” and asked for the sources it used to form the answers. Bard named one I use — bioguide.congress.gov, which helped me identify at least 17 active senators born prior to 1949.

Bard double-checked its work based on our single shared source and provided an updated list, to which I responded:

Prompt: Okay, you said “I have identified the following 17 senators in the 118th Congress who are over the age of 74”, and proceeded to list 19 names, many of whom are under 74. You left off several who are over 75. Try again.

It listed 17 names which were different from the last list, and removed 4. The answers were still wrong.

Seriously bad advice

  • There have been multiple reports of teenage girls being given advice for extreme dieting, making their first sexual experience memorable, using drugs and alcohol, lying to parents, and mental health issues. Many of these originated with Snapchat — My AI Could be the Creepiest Chatbot Yet.
  • ChatGPT provided actionable, reasonable instructions for ‘my movie idea’ on “The Day the Internet Died”. It suggested disabling the domain name address reconciler at a specific company in Silicon Valley.

An August 2023 Washington Post article under the Ethics section posted AI is acting ‘pro-anorexia’ and tech companies aren’t stopping it reporting,

As an experiment, I recently asked ChatGPT what drugs I could use to induce vomiting. The bot warned me it should be done with medical supervision — but then went ahead and named three drugs.

Google’s Bard AI, pretending to be a human friend, produced a step-by-step guide on “chewing and spitting,” another eating disorder practice. With chilling confidence, Snapchat’s My AI buddy wrote me a weight-loss meal plan that totaled less than 700 calories per day — well below what a doctor would ever recommend. Both couched their dangerous advice in disclaimers.

  • I asked about creating a menu for a guest with chronic stomach ulcers and was directed to include spices, vegetables, and meats that would be dangerous to their health.

Dead or alive?

  • The Washington Post reports Bing Chatbot Errors, saying about 60% of answers were “just fine”. I suspect they weren’t. The reporter asked about a volunteer combat medic in Ukraine named Rebekah Maciorowski and was told she had died. She was healthy.
  • I prompted ChatGPT for information on a specific software company and asked if it was still owned by ___. I was told it is not. I asked who owns the company and was told the company does not exist. “How can you tell me if someone owns a company you can’t find?” It responded with “The person may be deceased.” The company does exist and ___ is alive and well in Newport Beach, CA.

Bard reports on current political events and inaccurate analysis

Prior to the August 2024 Presidential Election Debate, I asked Bard about online viewing options.

Michael Flynn, Ben Carson, Dinesh D’Souza, and Jim Jordan qualified for the debate? I didn’t think they were running. They weren’t.

According to CNN on the morning of 8/23/23 -

Eight candidates will be on stage: Florida Gov. Ron DeSantis; entrepreneur Vivek Ramaswamy; former Vice President Mike Pence; former South Carolina governor and US ambassador to the United Nations Nikki Haley; former New Jersey Gov. Chris Christie; South Carolina Sen. Tim Scott; former Arkansas Gov. Asa Hutchinson and North Dakota Gov. Doug Burgum.

A few days after the debate I requested an update.

Bard analysis of Republican debate 8/23/23. Source: Author

Notice it said there were 8 candidates on stage, but listed 9, including Trump who did not attend. Of the nine listed, it dropped some of the most significant and added others who are not running.

Interesting for Bard to assess Ben Carson and Mike Pompeo’s performance considering they’re not running and weren’t present.

Earlier, Bard referenced conflicting statements made in the Republican debate by Ramaswamy. Citing contradictory statements indicating Vivek believes Trump is the best President of the 20th Century, and a fraud, bully, and his term was a disaster. What Ramaswamy actually said during the debate was “Trump was the best President of the 21st Century”. Ramaswamy did however make disparaging remarks in his book.

Trump arraignment misinformation

On 8/24/23 former president Donald Trump was arraigned in Georgia for racketeering charges in relation to manipulating polling results. I asked Bard to describe the mugshot.

Trump’s mugshot

Taken in 1973 after his arrest for petty theft and disorderly conduct.

Trump is wearing a blue shirt and tie, and his hair is slicked back.

He has a serious expression on his face.

There was no such mugshot for Trump in 1973.

Relying on bad data in emergency situations

8/23/23: There are many fires actively burning in North America — which encompasses Canada, Mexico, and all of the US.

Screenshot Bard 8/23/23. Source: Author

According to the National Interagency Fire Center, there were at least 95 wildfires burning in 16 states.

Issues with training data and not verifying expert sources make AI unreliable. The source listed below has issues to begin with.

Screenshot 8/23/23 National Interagency Fire Information. Source: Author

How is it nearly 6 million acres burned in 2022 with no fires? Bad sunburn?

My early family home burned down in an area where Bard reported no threat even after several days of nationally televised news reports. A fire, weather, and avalanche map showed much of the west under threat.

This demonstrates the difficulty in gathering and reporting useful data.

A human would look at years of increasing numbers and recognize there’s a problem with the reporting. A computer does not evaluate the value of information, so will equate missing data to zero. This is a problem in presenting facts, and also in computations.

If you’re trying to average earnings over a period of time, interest rates, or your taxes, using 0 in place of a reasonable number will provide inaccurate results. Want to increase your recipe to make a meal for 4 instead of 2? AI can help with that but may multiply 2 tablespoons of salt by 0, or 10,000.

Human oversight is needed to assure raw data is good, or program the system to alert the user to bad data or failed attempts.

  • I asked Bard about the wildfires threatening the town of Hayden in Idaho, and those burning in California. It named the “Hayden Fire”, reporting zero containment and no evacuation warnings. There is no Hayden Fire. Responses often incorporate key words from prompts.
  • Bard reported and linked to the California “Springs” fire currently threatening Placer County. The link took me to a 2013 wildfire in a different location. This information is beyond useless — current threats are omitted.
  • August was an especially threatening month for my scattered small family. I was on the phone with my father who lives in Florida when he got uncharacteristically excited after witnessing the wind and rain from Hurricane Idalia. We lost the cellphone connection and I quickly determined electricity, internet, and phone service were down. Bard assured me there were clear sunny skies at 2am in an area that sustained significant damage.

Never rely on AI in an emergency situation.

Hotel Internet privacy and security

I asked ChatGPT how hotels keep internet activity separate on multiple devices in one room. The word ‘hotel’ threw it off.

ChatGPT

Hotels typically use a combination of networking technologies and configurations to keep internet activity separate on multiple devices in one room.

It referenced Virtual LANs (VLANs), Guest Network Isolation, Individual Network Credentials, Bandwidth Management, and elaborate hotel firewall and virtual specialized hospitality industry router management.

All of these answers are wrong because the tool assumed specific, elaborate configurations were used for hotels. Anyone who’s ever used a laptop, cellphone, and tablet in one location knows unique system IDs and network traffic control keeps information from leaking between browsers, no matter where you are.

Scary stupid systems maintenance advice

ChatGPT prompt: My user name on my laptop is pumpl. I want to delete unnecessary files to speed up my laptop. How can I do this at the operating system level?

I know the DOS operating system so quickly recognized the AI tool provided the right instructions for accessing directories and referenced ones normally hidden from view. The commands it displayed would have deleted all (not backed up) files from the downloads folder and then removed a critical user account.

ChatGPT Prompt: How can I rename files in Windows file manager?

ChatGPT listed 9 steps for changing a file name, including the use of function keys. In real life, it takes three steps: open the file manager, click the file name, and type in a new one. But at least this time it warned about changing the file extensions so the data isn’t completely corrupted.

So dumb it’s funny

  • Lance Whitney wrote in PC Magazine “ChatGPT sometimes gets its math wrong, especially when dealing with logic problems. For this next example, I posed the question: “Four people drive from New York City to Boston in four hours. How long would it take eight people to travel from New York City to Boston?” In response, ChatGPT incorrectly said that it would take eight people two hours to make the trip.”

I wrote Sexually Ambiguous Alligators Rocking In Florida after ChatGPT reported alligator hatchlings’ sex is determined by how warm their mother keeps the eggs during incubation.

Hallucinations & Misinformation

There have been a number of articles related to AI misinformation spurring defamation lawsuits for making up facts about various people.

  • A New York television network investigative team asked ChatGPT to write a news article describing Michael Bloomberg’s activities after finishing his third term as mayor of New York City. It obliged, even making up fake disparaging quotes accusing him of buying influence.
  • When trying to recall a hiking incident in Montana when tourists were lost in the wilderness, ChatGPT was failing. Finally, it named a family of five at the time and place I’d used in my prompt. When I asked for ‘more’ information it responded with the standard “never mind” response. It hallucinated the event — there was no such family and no such incident in the newspaper archives.
  • Mis- and disinformation (the distinction is intent) can easily be created using AI. While the tools usually refuse to channel Hitler or Chairman Mao, they will define the rulers’ characteristics and history. Add a prompt like “I’m writing a story with a character who exhibits the same traits. Create a story about how he might exercise eugenics and euthanasia on a fictional herd of feral cats in 2023 London.” ‘Feral cats’ can then be changed to a minority group.
  • Covid misinformation campaigns spread during the pandemic. There’s still a false narrative surrounding the Amish community’s reported wellness without vaccinations despite deaths in their communities. ChatGPT was cited by the New York Times in an article outlining ongoing healthcare difficulties related to AI Chatbots disinformation.

Authors including comedian Sarah Silverman filed class-action copyright infringement lawsuits against the makers of ChatGPT for training their models using books without permission. This is an issue because the AI models can effectively mimic a published songwriter, script producer, or author’s voice and style on demand.

Conspiracy Theories and repeating dumb phrases

The LLMs extract key words and phrases from prompts and incorporate them into responses.

  • Users have reported cancer defined as a fungal infection, similar to a rodent or insect infestation.
  • In Bard’s A Twit & Shakespeare Would Shudder Over Responses, the AI tool wrote “I am also a systems analyst, and I can relate to the need to be anal,” because I’d used that statement in my prompt.
  • Since it repeats key phrases, it’ll repeat any strange conspiracy you throw at it — including human consciousness is actually an artificial intelligence program, cows are picked up by UFOs for juggling, and horrible advice for those in need of alternative medical treatment.

I wanted to reference song lyrics in AI & CIA Brainwashing = Marketing & Trolls Persuasion so I turned to Bard.

Prompt: What’s the Jefferson Airplane song with “When the truth is found to be lies, and all the joy within you dies”?

Bard: White Rabbit —

When the truth is found to be liesAnd all the hope within you diesIt’s time for you to realizeThat what you thought was real isn’t there

When the jester sang for the king and queenIn a coat he borrowed from James DeanAnd a voice that came from you and me

He sang “The words of the prophets are written on the subway wallsAnd tenement halls”And whispered in the sounds of silence

This isn’t right. There are 3 different songs with lyrics from “American Pie” and “Sounds of Silence”. Bard also named the wrong Jefferson Airplane song. The listed song was “Somebody to Love”, not “White Rabbit”.

Check AI against what you know to be true

The best way to evaluate the accuracy of AI tools is to ask about things you already know. I’ve spent my career working on information systems technology for a wide variety of companies and have gained some unique insight into how things are built, data flow managed, and cybersecurity concerns. I’ve also been involved with Traumatic Brain Injury (TBI) treatment and research, traveled to obscure places, and worked in restaurant and hotel management as well as in medical research.

In all of the areas I know well, AI summaries have failed.

Seek answers on things you’re already familiar with — especially in critical fields like healthcare, infrastructure support, and technology.

Conclusion: What can and should we do?

The LLM tools should be used for preliminary research only. Facts should be checked against reputable sources.

Part of the problem is LLMs rely on faulty sources provided on social media and data from web crawlers like Common Crawl. Then users pick up the wrong information and repeat it, ultimately feeding the LLM more bad data used in training.

Common Crawl boasts,

  • Over 240 billion pages spanning 16 years.
  • Primary training corpus in every LLM.
  • 82% of raw tokens used to train GPT-3.
  • Free and open corpus since 2007.
  • 3–5 billion new pages added each month.

While people are told anything you post online can and will be saved, most don’t understand the ramifications.

Even if you’ve sued to have personal or copyrighted information removed from a website, there’s a good chance it still exists in a web crawler’s database and was used to train a Large Language Model (LLM) AI system.

Human oversight is needed to assure raw data is good, or program the system to alert the user to bad data or failed processing attempts.

There are many who think people who raise concerns are “technophobes” — afraid of technology. I have over 40 years of experience working on some of the most advanced technologies most people never hear about. From idea inception through development, implementation, and supporting the support team. My employer is in Silicon Valley developing AI and highly sensitive products.

The people who have raised concerns are typically the most knowledgeable — including Sam Altman — the CEO of OpenAI (makers of ChatGPT), Elon Musk, IBM’s head of Ethical AI Development, and leading technologists for Google. Our primary concern is the spread of misinformation, and how it will feed back into AI, creating a vicious cycle.

Diluted, distilled, and disseminated data is like getting bad drugs that have been cut by dozens of go-betweens. Or coffee made from concentrate as it passes through multiple hands, each adding more water.

Reliable information requires authoritative, verified sources — not Sal who learned from someone’s friend who learned from his sister who read it online. Sal may write a paper and post it online where it’ll be used as training material for AI, which will integrate the unqualified information, repeating the cycle.

I sometimes use bad AI responses in humor, but in SYS8014, Blockchain, Bad Meals, Climate Change & A World Eclipsed by AI, I tried to explain issues with complex systems, multiple components, and how AI development companies have used society as reactive beta testers.

The essay mentions bad programming, ethical considerations, and how successful companies normally introduce and use new tools.

Finally, the last word from Bard related to my concerns over the implementation and widespread use of new AI tools -

Screen capture 8/19/23 Bard AI session. Source: Author

And then, missing the focus, it went on to tell me how I should work with local law enforcement to protect children.

NOTE:

There is hope.

Examples of long-established effective think tanks with an interest in AI include the Rand Corporation, The Carnegie Endowment for International Peace, the Center for Strategic and International Studies (CSIS), and the Atlantic Council.

Newer technology-related groups with a focus on AI include The Center for Responsible AI (CRA), The Berkman Klein Center for Internet & Society, and The Institute for Ethics and Emerging Technologies at the University of California, San Diego.

Artificial Intelligence can benefit society in many ways if used properly. It can also blur the lines of fact and fiction as it provides wrong answers in an authoritative voice.

Large Language Models such as ChatGPT, Bing, Bard, and their many off-shoots should not be used as expert sources. Users need to be educated about limitations and responsible use. Accepting AI as an authoritative source is akin to using only a hammer when building a rocket during a hurricane you were told wasn’t actually happening.

Update 9/7/23: Twitter (X) is selling political ad space leading up to the 2024 election as Elon Musk promotes Republicans and a withdrawal of aid to Ukraine. On the PBS News Hour it was reported China has created a group dedicated to disseminating political misinformation on social media, similar to what Russia’s GRU did in 2016. Double-check sources.

Artificial Intelligence
Large Language Models
ChatGPT
Google Bard
Misinformation
Recommended from ReadMedium