Artificial Intelligence, Language & Prompt Engineering
Artificial Intelligence struggles to write sentences in reverse order. And what this tells us about LLMs (and their vulnerabilities)
Also, why AI can’t hit your word count accurately
At this stage it’s gotten pretty lame to try to trick AI text generators. It’s easy to stump AI, and prompt injection hacks are becoming something of a joke, where humans brag about ‘outsmarting’ machine learning (I’m guilty too).
But this is a prompt engineering experiment to reveal something specific about how AI procedurally generates text; and what this tells us about how GPT-4 ‘thinks’. Or more accurately: how AI approaches complex prompts, and the limitations this imposes on well-reasoned and naturalistic output.

I wanted to test out ChatGPT’s sentence processing, just like we might with humans. Reverse speech has often been used as a control task in language mapping the areas of the brain, and sentence rearrangement exercises are used to test and train people affected by word retrieval difficulties (my own aphasia challenges have given me a unique insight into the AI comparison).
The experiment: AI Alice through the Looking Glass
The task was simple: initially I prompted ChatGPT to answer all my queries in reverse. But then I realised this might add an extra layer of deciphering. I didn’t want it to be simply mirror writing, writing each letter backwards in a sentence (although I discovered this exercise can also derail AI). I wanted individual units of meaning (words) to remain entire and intact.
So I prompted:
I want to you write backwards to me. Just reverse-order words, not letters. For example, the initial sentence would be: “Me to backwards write to you want I”
The first response was acceptable, if stilted.
Problem no it’s, request your follow will I. Here, help to happy am I!
Here it is reorganized:
I am happy to help, here. I will follow your request, it’s no problem.
There’s an interesting breakdown of the syntax into smaller clauses using commas (we see more of this later in the experiment). It breaks the output into batches to reverse a short section of foreseeable words. Two steps forward, one step back (“Problem no it’s, request your follow will I.”)
This syntactical leapfrogging is acceptable, at least for short replies.
However, as the conversation progressed, it became increasingly clear that GPT struggled to consistently and accurately generate text in reverse order:
User: Please tell me about Alice in Wonderland in great detail.
ChatGPT: Girl young a follows story the, Alice named, boredom her escaping and daydreaming, rabbit a after chasing ends she where. World fantastical a into falls she where, hole rabbit the down goes she. Creatures curious and peculiar with filled, Wonderland called place this in, Alice encounters. Hatter Mad the, Queen Hearts of the, Rabbit White the, and Hare March the including.
Here’s the entire response:

While some responses seemed coherent, others appeared garbled or only partially reversed. This inconsistency highlighted a limitation in the AI’s ability to ‘think’ ahead or anticipate, and to adapt its language generation process to reversing sentence structure in a systematic, meaningful way.
It reveals that the AI didn’t truly have a sense of what was coming next, or how a sentence might end. The effect became more pronounced when I asked it to reverse entire passages. Then it really fell down the rabbit hole:

What this tells us about LLMs
The difficulty in generating sentences backwards reveals some interesting insights into how large language models like GPT-4 operate. These models are trained on vast amounts of text, which helps them learn the structure, grammar, and semantics of language. AI generates text procedurally — i.e. one word at a time — a little like predictive spelling. AI generates the next word in a sequence based on a probability value. However, the training data predominantly consists of standard, forward-ordered sentences.
As a result, the AI is highly skilled at generating text in this familiar format, but struggles when faced with atypical tasks like reversing sentence order.
How this differs from natural human language production
Backward speech is highly demanding cognitive task in humans, exposing not just our linguistic adroitness, but our working memory and linguistic buffer. In regular speech, we know how a sentence will wind up, usually before we have begun vocalising it in order for the syntax to make sense. It might change, but we have a holistic awareness of what we’re going to say.
AI doesn’t have this phonological loop. To be blunt, it’s just making shit up as it goes along, without an end in mind other than the winding down of a sentence structure from its pre-existing knowledge of language. It’s almost a stream of consciousness. Its reasoning is limited by the order in which it generates words (this is one reason why we can’t make AI hit word counts).
Workarounds: Means to an End
Obviously, we could preset an end word in the prompt. But this just Seeds that word as more likely to occur in the output (increasing the logit bias to ensure it pops up). The AI just continues until it hits your chosen word, rather than working backwards. AI language models work procedurally towards a goal, but that goal can’t be the end-product of the language sequence itself. It is created in the moment. It can’t ‘get ahead of itself’ like a human can and see the bigger picture, the Gestalt whole of a locutionary act (that may sound trippy, but think about how you sense what you’re about to say).
What this means for humans
Other than being able to defeat our robot overlords by tricking them with backwards sentences (!) this experiment exposes some key differences between naturalistic language processing and AI language generation.
The inability of AI to accurately generate backwards sentences shows that, despite their impressive capabilities, language models like GPT-4 are still a far cry from the flexibility and foresight of human language. Our brains are capable of handling complex tasks that require planning, working memory, and contextual nuance. These qualities enable us to communicate effectively and creatively, even when faced with atypical language.

In contrast, AI language models remain heavily dependent on the patterns and structures present in their training data. When confronted with tasks outside their familiar scope, they struggle to adapt, revealing the limits of their reasoning. While humans can easily grasp and follow the concept of reversing word order, ChatGPT’s inconsistent performance suggests that it might be over-relying on its pre-existing knowledge of language structures.
This can be both a strength and a weakness, as AI models can excel at tasks that mirror their training data but falter when faced with novel challenges.
Until future improvements to language models make them more adaptable to diverse tasks, reverse sentence requests could be used to great effect to distinguish between AI and human generated language, both in AI content detection and plagiarism software, and in foiling malicious chatbot attacks by using backwards instructions as challenge–response test (or CAPTCHA).
Future steps:
By understanding the limitations of procedurally generated language, AI researchers and engineers can develop LLMs to better anticipate context, plan ahead, and understand the ‘big picture’ of language production for:
- Targeted word counts, which require forethought and awareness of where language is going ahead of time; a sort of fuzzy end point.
- Bridging paragraphs. The ability to seamlessly generate transitional text between sections of preexisting text, with a destination in mind.
- Increased ‘burstiness’. The natural lulls and spurts in human writing, which AI currently finds difficult to mimic, rely on anticipating if we are writing long or short sentences ahead of time, and switching it up.
This will involve refining training techniques, expanding training data to include more diverse language patterns, and fostering an understanding of holistic sentence structures. Ultimately, these advancements will bring AI language models closer to the nuances of human language processing, making them more effective tools for various applications, while still maintaining their strengths in handling large-scale language tasks.
Follow for AI prompt engineering tips from Jim the AI Whisperer
As Jim the AI Whisperer, I provide training and consulting services to help companies and entrepreneurs integrate and properly utilize AI. Don’t miss out on the huge benefits of AI tools for your business, from using large-scale language models to AI art generators. Take control of new technology and make informed decisions. Contact me to learn more.
I’m also available for journalism opportunities, podcasts, interviews, fine-tuning AI prompts, and creating custom prompt libraries for companies.





