Crucial key to understanding what’s going on in LLM / ChatGPT hallucination (& logical errors)
To the extent that inferences are deducible logically from input context data and background world knowledge, I argue that, mostly, LLMs like ChatGPT-4 are reasoning little differently to a smart person . .
. . so why the hallucinations? And errors?
What’s going on?
There are 2 types of factually incorrect responses:
- We call the occasional factually incorrect — based on world knowledge — but grammatically correct responses, quite aptly, ‘hallucination’.
- And then there’s plain logical and mathematical errors that ChatGPT makes that are internal to the context provided by the user
What is going on indeed?
If I’m right that premium LLMs have essentially learned logic, then how can we explain these two types of hallucination?
The traditional answer is that these things are just (incredibly) advanced ‘next word’ predictors. If they don’t know the answer — and don’t know that they don’t know it (because there’s no training on it not being true) — they make it up because it’s grammatically allowed.
I largely agree with that . . except . .
. . this applies to hallucination of things people said and so on. And that’s largely solved-able by providing factual details in to the prompt context via e.g. automated web-search or ‘Retrieval Augmented Generation’ methodologies.
Intuition mode
But what about actual logical errors that are internal to the context provided to them by the user?
Getting math wrong and occasionally logic. And forgetting part of the query’s requirements?
App developers and AI researchers have discovered they can partly eliminate this through prompting the LLM to work ‘step-by-step’, and even report interim results.
I argue that GPT actually has a habit — indeed a base modus operandi — of responding ‘semi-intuitively’ by default. But if you ask it to do step-by-step logic and list the steps and outcomes, it is more reliable.
Why is it running by default in an off-the-cuff intuition mode?
I think it’s because it is indeed a ‘next word’ predictor. It’s responding as a (superhuman) person might respond . . off-the-top of their head.
Including at math and logic.
All in a single pass.
Anything that requires iteration at all it has a decent chance of getting wrong. Because it indeed does work in one pass. That’s the nature of the beast.
That’s why asking it to work step-by-step and even building iterative apps that do multiple calls, double checking and iterative thinking using the core GPT APIs get much better results.
So, I contend that LLMs are genuinely extremely good at logic — because they learned it — but we must use the APIs properly to build better iteration to attack challenging problems.
And indeed we are.
I think that’s what’s going on.
It’s the intuition persona — which is impressive — vs the hard working analytical type which we are mostly seeing from GPT, unless you guide it otherwise.
Just like in real life — especially in work meetings — the intuitive quick answer guy impresses. But the poor plodding analytical type gets the team the right answer two days later.
Unless the team got waylaid by the first guy. That’s why I’m always ‘wait a minute!’-ing during meetings. Are we really going on the basis of that . . intuition?
Lol.
