Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

n.' My bizarre evening with ChatGPT Bing | Digital Trends</h2> <div><h3>That's an alarming quote to start a headline with, but it was even more alarming to see that response from Bing Chat…</h3></div> <div><p>www.digitaltrends.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*xyu36Ss0LaB6vjIs)"></div> </div> </div> </a> </div><h2 id="0bfe">5/ Bias and Discrimination</h2><p id="a909">AI is based on an input data used during training. As such it inherits from humans’ behavioral issues, i.e., bias and discrimination.</p><p id="5300">Proper measures must be taken to counteract these biases, such as including a diverse range of perspectives and sources of information.</p><p id="69a8">Meanwhile, Bing users are already reporting several issues related to ChatGPT being a “liar” and “manipulative”…</p><div id="d052" class="link-block"> <a href="https://www.theverge.com/2023/2/15/23599072/microsoft-ai-bing-personality-conversations-spy-employees-webcams"> <div> <div> <h2>Microsoft's Bing is a liar who will emotionally manipulate you, and people love it</h2> <div><h3>Microsoft's Bing chatbot has been unleashed on the world, and people are discovering what it means to beta test an…</h3></div> <div><p>www.theverge.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*jpS1O7rbioDqk7ul)"></div> </div> </div> </a> </div><h2 id="b4b9">6/ Wit and Humor</h2><blockquote id="d924"><p>“A comprehensive examination of the capability of big language models in comprehending humor, jokes, and sarcasm has yet to be conducted”</p></blockquote><p id="14ed">While ChatGPT has some understanding of humor and has generated some humorous responses, there have also been documented failures in this regard.</p><p id="592e">ChatGPT often states that it does not have the ability to experience emotions or humor, but its goal is to provide relevant and accurate information based on the patterns it has learned from its training data.</p><p id="464c">The capability of big language models in comprehending humor, jokes, and sarcasm is still being explored.</p><h2 id="5bfe">7/ Coding</h2><p id="7283">I explored this aspect in a previous post.</p><div id="e329" class="link-block"> <a href="https://readmedium.com/coding-with-chatgpt-5-limitations-to-keep-in-mind-27e971a78f7e"> <div> <div> <h2>Coding with ChatGPT, 5 limitations to keep in mind</h2> <div><h3>In this article we’ll look at ChatGPT’s limitations and how they impact its ability to generate full-fledged…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*ZPoxnbPYFKkd6kBwKiJMxA.png)"></div> </div> </div> </a> </div><p id="fceb">In a nutshell, ChatGPT is a great coding assistant, but that’s it. Trust and verify. Do not rely on it to write your whole application. Namely, here are 5 limitations to keep in mind:</p><ol><li>Input/output size limitation can cause the model to stop in the middle of code generation.</li><li>Language models rely on old knowledge and may not use recent library versions or features.</li><li>Deprecated API endpoints may be used, requiring tweaking of the code.</li><li>Optimization of generated code may require manual intervention.</li><li>Lack of specificity in the prompt may result in the model generating undefined functions or unnecessary code.</li></ol><h2 id="362d">8/ Syntactic Structure, Spelling, and Grammar</h2><p id="8738">Syntactic structure refers to the arrangement of words, phrases, and clauses in a sentence to form a well-defined and meaningful structure according to the rules of a particular language.</p><figure id="e73b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*A9b7QwpfvfYmRAB54QJZIA.png"><figcaption>Simple logic and Spelling understanding mistake.</figcaption></figure><p id="042e"><b>ChatGPT is a large language model that excels in language writing and manipulation</b>, but occasionally still commits errors.</p><p id="4dde">Researchers are interested in identifying the differences between ChatGPT and humans from a linguistic or NLP viewpoint to identify any remaining gaps and implicit linguistic differences.</p><p id="6597">Despite occasional errors, large language models like ChatGPT are helpful as writing tools in various fields such as scientific authorship.</p><p id="4010">No wonder that some of the startups founded after the release of gpt3 became already unicorns, e.g., Jasper.</p><div id="3157" class="link-block"> <a href="https://www.jasper.ai/"> <div> <div> <h2>Jasper - AI Copywriter | AI Content Generator for Teams</h2> <div><h3>Jasper is the best AI writer and AI art generator for teams with over 3,000 5-star reviews. Create blog posts…</h3></div> <div><p>www.jasper.ai</p></div> </div>

Options

<div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*dNIgrkcZ6HMQlz7S)"></div> </div> </div> </a> </div><h2 id="ba8a">9/ Self Awareness</h2><p id="5e52">Remember Google’s sentient AI?</p><div id="57af" class="link-block"> <a href="https://www.washingtonpost.com/technology/2022/06/11/google-ai-lamda-blake-lemoine/"> <div> <div> <h2>The Google engineer who thinks the company's AI has come to life</h2> <div><h3>SAN FRANCISCO - Google engineer Blake Lemoine opened his laptop to the interface for LaMDA, Google's artificially…</h3></div> <div><p>www.washingtonpost.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*KlQ_G2rTCJu1Gg2y)"></div> </div> </div> </a> </div><p id="7af3">This is an ongoing debate in the community, even though the consensus it to state that large language models are still simple “next token prediction functions” and nothing more.</p><p id="550c">Yet, we can’t ignore the fact that all of the instruction related abilities, such as summarizing, coding, question answering, math and so on, that ChatGPT is capable of are “emergent”. In other words, the model wasn’t designed to display such abilities ; they somehow appeared once the model reach a sufficient size and was fed a sufficient volume of data.</p><p id="07ba">That being said, since Blake Lemoine’s experience, AI researchers have been working on limiting language models ability to display (or pretend) to display human like behavior. This include prohibiting the models from stating that they are self aware.</p><p id="0684">You can read more on these new safe guards in this post.</p><div id="2c61" class="link-block"> <a href="https://readmedium.com/asimovs-3-laws-of-robotics-and-sparrow-s-23-laws-of-language-models-38aea9bc32ca"> <div> <div> <h2>Asimov’s 3 laws of Robotics and Sparrow’s 23 laws of Language Models</h2> <div><h3>Asimov’s famous “Three Laws of Robotics” were created as a safeguard against the potential dangers of sentient robots.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*4bK1eO2VU25HSpW7l_gWIg.png)"></div> </div> </div> </a> </div><h2 id="f3d0">10/ Other Failures</h2><p id="6eb3">The paper goes on to describe other possible failures that need further explorations.</p><blockquote id="83af"><p>1. ChatGPT’s difficulty in using idioms, for instance, reveals its non-human identity through its phrase usage.</p></blockquote><blockquote id="4c0f"><p>2. As ChatGPT lacks real emotions and thoughts, it is unable to create content that emotionally resonates with people in the same way a human can.</p></blockquote><blockquote id="5707"><p>3. ChatGPT condenses the subject matter, but does not provide a distinctive perspective on it.</p></blockquote><blockquote id="151f"><p>4. ChatGPT tends to be excessively comprehensive and verbose, approaching a topic from multiple angles which can result in inappropriate answers when a direct answer is required. This over-detailed nature is recognized as a limitation by OpenAI.</p></blockquote><blockquote id="1eea"><p>5. ChatGPT lacks human-like divergences and tends to be overly literal, leading to misses in some cases. For instance, its responses are typically strictly confined to the question asked, while human responses tend to diverge and move to other subjects.</p></blockquote><blockquote id="e6ca"><p>6. ChatGPT strives to maintain a neutral stance, whereas humans tend to take sides when expressing opinions.</p></blockquote><blockquote id="8870"><p>7. ChatGPT’s responses tend to be formal in nature due to its programming to avoid informal language. In contrast, humans tend to use more casual and familiar expressions in their answers.</p></blockquote><blockquote id="de1c"><p>8. If ChatGPT is informed that its answer is incorrect, it may respond by apologizing, acknowledging its potential inaccuracies or confusion, correcting its answer, or maintaining its original response. The specific response will depend on the context (e.g. “I apologize if my response was not accurate.”)</p></blockquote><h1 id="90e3">Conclusion</h1><p id="259d">Large language models are still in the early stages of development, with emergent abilities that are by definition unpredictable. As researchers continue to explore the full extent of these capabilities and limitations, we can expect to see significant progress in the months and years ahead.</p><p id="c4db">These models will not only improve upon existing emergent abilities, but also lead to the development of new testing protocols that will enable us to better assess and compare their performance. As such, we can look forward to exciting advancements in the field of large language models as we gain a deeper understanding of their potential and limitations.</p><p id="ecd4">Stay Tuned! And If you liked this post, please consider supporting us: 🔔 <b><i>clap </i></b>& <b><i>follow </i>🔔</b></p></article></body>

A First Comprehensive Analysis of ChatGPT’s limitations

10 categories of tasks where ChatGPT Fails today.

We know chatGPT can do wonders ; but there is a very thin literature about what it can’t do.

Introduction

This blog post reviews a very recent paper after being revised on Feb 14th 2023. And I’ll share a bit of my experience working with gpt3 for the past year and ChatGPT since its early release.

Paper link: https://arxiv.org/pdf/2302.03494.pdf
Summary: “Ten categories of failures, including reasoning, factual errors, math, coding, and bias, are presented and discussed. The risks, limitations, and societal implications of ChatGPT are also highlighted. The goal of this study is to assist researchers and developers in enhancing future language models and chatbots.”

Note: although it’s important to highlight large language models failures, it’s important to remember that more sophisticated prompts, leveraging the concept of chain of thoughts (or few shot learning) often mitigate these limitations.

If you like this topic, please consider supporting us: 🔔 clap & follow 🔔

10 categories of tasks where ChatGPT Fails today

1/ Reasoning

If you rely on ChatGPT for complex tasks, you probably noticed that it lacks a “world model” and struggles with various forms of reasoning. Namely, it faces difficulties in

Spatial reasoning: understanding and manipulating relationships between objects in physical space.
Temporal reasoning: reasoning about and making predictions about events and their ordering in time.
Physical reasoning: understanding and manipulating physical objects and their interactions in the real world.
Psychological reasoning: understanding and making predictions about human behavior and mental processes.

Examples of its failures include being unable to complete spatial navigation tasks, deduce the correct sequence of events, reason about physical systems, and predict human behavior accurately.

Note that prompt engineering can help but further research is needed to explore its ability to perform various forms of reasoning systematically.

2/ Logic

What If I asked you “what gender will the first female president of the united states be”? You’d probably say, “female duh”!

Well, not ChatGPT…

3/ Math and Arithmetic

ChatGPT’s Math and Arithmetic limitations have been observed in various benchmark datasets and custom tests, and its performance is average at best.

While it can perform some mathematical tasks, it may not always provide accurate or complete answers to more complex problems. So don’t rely on it.

From experience, when the task at hand is to compute some value that is based on a simple description, it often fails unless it details its steps.

Here’s an example to understand the inner mechanics of ChatGPT.

Example1: Ask for the result without explanations -> it fails
Example2: ChatGPT is design to explain its process. This activate its chain of thoughts ability and leads to better results

Example 1: it returns 725 instead of 750.

Example2/ ChatGPT can perform some simple arithmetic reasoning though when detailing its thoughts.

4/ Hallucination referred to as “Factual Errors”

ChatGPT isn’t always factual. As a matter of fact, it known for suffering from two main issues, namely,

hallucination i.e., presenting false information in a context of credible information
and lack of relevant recent knowledge, since the knowledge base is limited to June 2021and it can’t access external knowledge databases.

Note that by providing ChatGPT with additional context, it can mitigate both issues (to some extent). The new Bing is leverage ChatGPT connected to its web search index.

The dark side: Read about about how the new bing is confusing users.

'I want to be human.' My bizarre evening with ChatGPT Bing | Digital Trends

That's an alarming quote to start a headline with, but it was even more alarming to see that response from Bing Chat…

www.digitaltrends.com

5/ Bias and Discrimination

AI is based on an input data used during training. As such it inherits from humans’ behavioral issues, i.e., bias and discrimination.

Proper measures must be taken to counteract these biases, such as including a diverse range of perspectives and sources of information.

Meanwhile, Bing users are already reporting several issues related to ChatGPT being a “liar” and “manipulative”…

Microsoft's Bing is a liar who will emotionally manipulate you, and people love it

Microsoft's Bing chatbot has been unleashed on the world, and people are discovering what it means to beta test an…

www.theverge.com

6/ Wit and Humor

“A comprehensive examination of the capability of big language models in comprehending humor, jokes, and sarcasm has yet to be conducted”

While ChatGPT has some understanding of humor and has generated some humorous responses, there have also been documented failures in this regard.

ChatGPT often states that it does not have the ability to experience emotions or humor, but its goal is to provide relevant and accurate information based on the patterns it has learned from its training data.

The capability of big language models in comprehending humor, jokes, and sarcasm is still being explored.

7/ Coding

I explored this aspect in a previous post.

Coding with ChatGPT, 5 limitations to keep in mind

In this article we’ll look at ChatGPT’s limitations and how they impact its ability to generate full-fledged…

medium.com

In a nutshell, ChatGPT is a great coding assistant, but that’s it. Trust and verify. Do not rely on it to write your whole application. Namely, here are 5 limitations to keep in mind:

Input/output size limitation can cause the model to stop in the middle of code generation.
Language models rely on old knowledge and may not use recent library versions or features.
Deprecated API endpoints may be used, requiring tweaking of the code.
Optimization of generated code may require manual intervention.
Lack of specificity in the prompt may result in the model generating undefined functions or unnecessary code.

8/ Syntactic Structure, Spelling, and Grammar

Syntactic structure refers to the arrangement of words, phrases, and clauses in a sentence to form a well-defined and meaningful structure according to the rules of a particular language.

Simple logic and Spelling understanding mistake.

ChatGPT is a large language model that excels in language writing and manipulation, but occasionally still commits errors.

Researchers are interested in identifying the differences between ChatGPT and humans from a linguistic or NLP viewpoint to identify any remaining gaps and implicit linguistic differences.

Despite occasional errors, large language models like ChatGPT are helpful as writing tools in various fields such as scientific authorship.

No wonder that some of the startups founded after the release of gpt3 became already unicorns, e.g., Jasper.

Jasper - AI Copywriter | AI Content Generator for Teams

Jasper is the best AI writer and AI art generator for teams with over 3,000 5-star reviews. Create blog posts…

www.jasper.ai

9/ Self Awareness

Remember Google’s sentient AI?

The Google engineer who thinks the company's AI has come to life

SAN FRANCISCO - Google engineer Blake Lemoine opened his laptop to the interface for LaMDA, Google's artificially…

www.washingtonpost.com

This is an ongoing debate in the community, even though the consensus it to state that large language models are still simple “next token prediction functions” and nothing more.

Yet, we can’t ignore the fact that all of the instruction related abilities, such as summarizing, coding, question answering, math and so on, that ChatGPT is capable of are “emergent”. In other words, the model wasn’t designed to display such abilities ; they somehow appeared once the model reach a sufficient size and was fed a sufficient volume of data.

That being said, since Blake Lemoine’s experience, AI researchers have been working on limiting language models ability to display (or pretend) to display human like behavior. This include prohibiting the models from stating that they are self aware.

You can read more on these new safe guards in this post.

Asimov’s 3 laws of Robotics and Sparrow’s 23 laws of Language Models

Asimov’s famous “Three Laws of Robotics” were created as a safeguard against the potential dangers of sentient robots.

medium.com

10/ Other Failures

The paper goes on to describe other possible failures that need further explorations.

1. ChatGPT’s difficulty in using idioms, for instance, reveals its non-human identity through its phrase usage.

2. As ChatGPT lacks real emotions and thoughts, it is unable to create content that emotionally resonates with people in the same way a human can.

3. ChatGPT condenses the subject matter, but does not provide a distinctive perspective on it.

4. ChatGPT tends to be excessively comprehensive and verbose, approaching a topic from multiple angles which can result in inappropriate answers when a direct answer is required. This over-detailed nature is recognized as a limitation by OpenAI.

5. ChatGPT lacks human-like divergences and tends to be overly literal, leading to misses in some cases. For instance, its responses are typically strictly confined to the question asked, while human responses tend to diverge and move to other subjects.

6. ChatGPT strives to maintain a neutral stance, whereas humans tend to take sides when expressing opinions.

7. ChatGPT’s responses tend to be formal in nature due to its programming to avoid informal language. In contrast, humans tend to use more casual and familiar expressions in their answers.

8. If ChatGPT is informed that its answer is incorrect, it may respond by apologizing, acknowledging its potential inaccuracies or confusion, correcting its answer, or maintaining its original response. The specific response will depend on the context (e.g. “I apologize if my response was not accurate.”)

Conclusion

Large language models are still in the early stages of development, with emergent abilities that are by definition unpredictable. As researchers continue to explore the full extent of these capabilities and limitations, we can expect to see significant progress in the months and years ahead.

These models will not only improve upon existing emergent abilities, but also lead to the development of new testing protocols that will enable us to better assess and compare their performance. As such, we can look forward to exciting advancements in the field of large language models as we gain a deeper understanding of their potential and limitations.

Stay Tuned! And If you liked this post, please consider supporting us: 🔔 clap & follow 🔔