Does ChatGPT Understand Or Not? I’ve Just Changed My Mind

What is understanding anyway?

No, this is not the announcement of a revelation I had with angels and skies opening with sunshine. Neither is it a confession in the style of Blake Lemoine when he proclaimed Google’s LaMDA was “sentient” (which resulted in Google firing him). To begin with, Lemoine is a Mystic Priest (whatever it is, declared by himself, I’m not trying to be funny).

My previous notion about ChatGPT and similar conversational AI systems can be condensed as “text autocomplete on steroids.” The term implies, first, that Generative AI chatbots are, at their core, predictors of the next word of a text, trained with zillions of human-produced texts so that, after this conditioning, they produce text that “sounds” like a human.

By the way, the best explanation I have seen about how autocompletion works in ChatGPT is the one in the short book by Steven Wolfram, “What Is ChatGPT Doing … and Why Does It Work?” because this book takes you step by step, gradually introducing sophistication in the stochastic way text is generated, from random gibberish to plausible-sounding texts.

Generative AI chatbots could “sound” like humans, but without the cognitive processes that happen inside our heads. Much talking, no understanding. That’s why they have also been called “stochastic parrots,” implying no comprehension whatsoever.

Gary Marcus, one of the leading proponents of this view, writes the following in a post published together with Sasha Luccioni:

“The AI systems that power these chatbots are simply systems (technically known as “language models” because they emulate (model) the statistical structure of language) that compute probabilities of word sequences, without any deep or human-like comprehension of what they say.”

We can see Marcus and Luccioni aim for “deep” or “human-like comprehension,” but I argue here that these requirements are not valid (what is “deep” comprehension, to begin with?).

In the post referenced above, Marcus and Luccioni ask ChatGPT itself whether or not it understands the questions it gets and the answers it gives, receiving two contradictory versions: in one of them, ChatGPT claims to have “some understanding of language” while in another, it humbly accepts its responses “are generated purely based on statistical patterns and associations learned from training data, without any subjective experiences or understanding of the world.”

Gary Marcus’ conclusion is:

“As the mutually contradictory examples above make clear, they don’t actually “know”; they are just generating different word strings on different occasions, with no guarantee of anything.”

It couldn’t be clearer the perspective of ChatGPT as a mechanical, non-understanding text generator.

For a second opinion, I asked the same question to HuggingChat, a new LLaMA-based chatbot, getting the following answer:

“Yes, based on current research and understanding of ChatGPT and similar large language models, Gary Marcus’ claim is mostly correct. While these models are highly effective at generating fluent, contextually appropriate text, they lack true linguistic knowledge, awareness of semantics, and a deep understanding of the meaning of words and phrases. […]”

Then again, there is the nuance of “mostly correct.”

There you have it. No “deep or human-like comprehension” of what ChatGPT says. And I used to agree with this.

But I changed my mind. At least a bit. Let me explain.

The Epiphany

My not-so-mystical revelation happened when I saw this YouTube video featuring Sebastien Bubeck from Microsoft Research. I don’t know him personally, but the first impressions I had of him were not very good. As I wrote in a previous post, the “paper” he co-wrote reporting some experiments with GPT-4 “is not really a scientific paper because it wasn’t even reviewed.” Further, the language used in the paper (even the title “Sparks of Artificial General Intelligence”) is just not acceptable in a scientific publication, both because it uses subjective terms like “sparks of…” and also for the unsupported claim about Artificial General Intelligence.

But then, in the video mentioned above, he made a very smart comment about the comprehension by ChatGPT and GPT-4: he said that without understanding the instructions given by the user in the prompt, it would be impossible to follow them.

Touché.

It’s almost impossible to argue with the logic of this argument: there is evidence that ChatGPT follows the instructions in the prompt most of the time (or refuses to follow them due to the “guardrails”). We have seen lots and lots of examples of ChatGPT following even outrageous instructions, like tossing merchandise on a store’s shelves or preparing poison from common ingredients.

But we have seen other examples showing a lack of deep understanding from ChatGPT: in my post about ChatGPT's sense of humor, I explored its capability (or lack thereof) to explain why a given joke is funny, which can be used to test its ability for commonsense reasoning. The results were mixed (sometimes it understood the joke, and other times it didn’t).

Keep up with me here, we are almost done with the argument.

If ChatGPT can sometimes understand and sometimes can’t, then we can measure its level of understanding.

The keyword here is “measure.” It’s not like it’s a “stochastic parrot” uttering words as they come to the mouth (so to speak), but there should be some measure of understanding that we can use.

After digging into psychology papers, I came up with a distinction between “Behavioral Understanding” and “Experiential Understanding.”

Behavioral Understanding vs. Experiential Understanding

The distinction between “Behavioral Understanding” (BU) and “Experiential Understanding” (EU) is that BU refers to the actions of the subject, whereas EU is related to what happens inside the mind of the subject. Thus, BU is objective and can be measured in tests; EU is subjective and is measured in questionnaires, often assuming sincerity from the subject.

Your EU is tied to what you feel; feelings are something that we mammals have since we are born. It’s, well, an experience, and as such, it’s related to feelings.

The distinction between BU and EU is not mine; there’s been an underlying debate in Psychology between the camp of behaviorism and cognitivism, the former focusing on the observable behavior and the latter on the mind’s internal processes. While my wife was studying constructivism in education, I remember that behaviorism was seen as reductionist and mostly bogus. The polarization has been even territorial: behaviorism is popular in America, and cognitivism is in Europe.

For the purposes of this post, I’m taking the “experiential” aspect of understanding –which is related to feelings– rather than the strictly cognitive aspect –related to thinking– because processes inside a human brain aren’t identical –or even very similar– to the ones happening inside a deep neural network, which are named “neural” for the sake of an allegory but don’t contain real neurons and have a name taken from a very conceptual analogy.

So EU is what we feel when understanding. You know, there is that sort of light bulb that is suddenly lit and gives us a convincing clue that we have, well, understood. We can’t expect the EU to happen in a machine because it doesn’t feel like it does, even if it pretends to do so.

But beware: the feeling of understanding can be an illusion, even in humans. I’ve seen many of my students who, after declaring a concept was clear like water, utterly fail to apply it to a specific problem. I tell my students not to rely too much on the feeling of understanding and put it to the test in a specific situation.

Behavioral understanding, on the other hand, is not an illusion –at least when taken in big numbers. If you set up an experiment, for instance, to see whether ChatGPT can or cannot get what’s funny about given jokes, you start collecting a number of them, then you ask ChatGPT to explain them, then you quantify how many of them the machine got correctly (as judged by one or several humans). I have informally done it myself and reported it in one of my previous posts.

The same goes for the task of following instructions: You set up a collection of instructions, put ChatGPT to follow them, and then verify the answers. That’s it. Not very hard. And this is exactly what Sebastien Bubeck said was proof of understanding. He meant, of course, BU.

In many reports –even when they don’t stand as formally scientific papers, like Bubeck et al. one– there is a quantitative measurement of the BU of ChatGPT and similar LLM. That much I give to Bubeck and collaborators: ChatGPT has an undeniably behavioral understanding of the prompts and also of its answers. That’s where I stand right now.

By the way, I don’t consider it shameful in any way to adjust my opinions. Humility is a requirement for critical thinking and for scientific inquiry, and those who are not ready to revise their beliefs in view of the evidence are on the path to fundamentalism –if not already there.

Closing Thoughts

I think many of the deniers of ChatGPT comprehension actually deny “experiential understanding.” Marcus says, “There is no there there,” as if we were looking for a kind of human-like understanding, which is, of course, the EU.

What puzzles us humans about ChatGPT and similar Generative AI systems is that they are an alien form of intelligence –it’s not like the human mind, as pointed out by Alberto Romero. Even more so when we tend to anthropomorphize the chatbots, attributing them human-like qualities like intentions, consciousness, and feelings that are simply not there.

But behavioral understanding, as limited as it can sound from the point of view of having “real comprehension,” could also be extremely useful. In particular, all this discussion about understanding has not stopped the entire information industry from incorporating Generative AI into their products (Adobe Firefly, Microsoft new Office, Notion AI, Akkio chat-based data visualization, and many more) or countless new Generative AI-based startups from spawning. They are not following a fad, they saw the potential of Generative AI, and while the terms BU and EU are not popular, they couldn’t care less about it.

Another question is whether the quest for AGI –the holy grail of AI– could be based on behavioral understanding. I think so, but AGI will be a matter of another post.

Stay up to date with the latest news and updates in the creative AI space — follow the Generative AI publication.