Detecting AI Generated Text — a Hard Problem
Generative AI taking over the content generating space, raises legitimate concerns. As a matter of fact, the potential for these models to “hallucinate” or produce factually incorrect content intensifies the urgency for reliable detection mechanisms.
Goal: we need a way to determine when a content was generated, even partially, by an AI. Yet, the task of detecting AI generated text proved to be very challenging!
Can we truly detect AI-generated text? Several initiatives announced their capability to detect it, only to later admit they couldn’t.
Why is this such a challenging problem?
GPT Zero: Trusted AI Detector?
Example — Human or AI?
Shortly after the release of ChatGPT, several initiatives emerged to build AI-generated text detectors. You might have heard of a tool named GPT Zero, which is presented as a silver bullet capable of detecting AI-generated content.
Yet… it’s far from being reliable. E.g., I generated the text below via GPT-4, entirely!
The tool seems to think that it’s a genuine humanly written content…

No harm done in this case… but what about concluding that the content was generated by AI, when in fact it wasn’t?
The Problem of Unreliable Detectors
Trusting such tools can lead to devastating consequences.
As a matter of fact, one major concern is the false positive rate, where human-written content is mistakenly flagged as AI-generated.
Such errors can have significant consequences, especially for professionals whose reputations hinge on producing original content, or students that can fail their exam if the professor concludes that they’ve cheated!
Unreliable detectors cause more harm that good!
Even OpenAI Admits Its Defeat!
OpenAI released then shutdown its text classifier!
OpenAI sunsets its “AI text generated” detector. Why? “As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy.”

In their own words: “Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives).”
Those numbers are bad, and OpenAI recognizes it as such by shutting down this model!
Why is it a Hard Problem?
In April 2023, during an interview, I already explained that detecting AI generated text is and will remain a hard problem! But why?

There are several ways to approach this challenge. For the sake of simplicity, here are the main intuitive ones:
- AI vs. Human Content Detection Model: This involves training a model to distinguish between AI-generated and human-generated texts. Essentially, AI and human-produced content are treated as two distinct classes. The model learns the patterns that set each class apart. However, with advanced models like GPT-4, the distinction becomes challenging since these models excel at mimicking human writing.
- Watermarking Generated Content: Here, a subtle ‘signature’ is added to AI-generated text. For instance, ChatGPT could be programmed to produce text with a specific token distribution that serves as a watermark. On the surface, this seems straightforward. But problems arise if users modify the generated text.… In a nutshell, this a fundamental problem (mathematical speaking) and can be summarized as “If you manipulate the model to introduce a certain detectable bias: how can you detect that probability of (biased) sequence of words, if that sequence is altered?”. Well… with no additional hypothesis, you can’t.
This doesn’t mean that it’s impossible to detect in all scenarios ; Actually the best detectors are usually very reliable if you’re looking at a binary problem where it’s either a human written content or a fully AI written content. Yet, in the common case of an AI assisted text generation, the detector can’t be trusted anymore…
Conclusion
The rapid advancements in AI-driven content generation have ushered in a new era of information dissemination, but not without its challenges. The intricacies of distinguishing between human and AI-generated texts are proving to be a formidable task, even for the most sophisticated detectors like GPT Zero. The risks of relying on imperfect tools are high, from tainting a professional’s credibility to compromising academic integrity. The fine line between human expression and AI-generated content is constantly blurring, making the distinction even more intricate. While certain scenarios might allow for some level of detection accuracy, the hybrid nature of AI-assisted text presents an altogether different challenge. The journey towards building a robust, foolproof AI content detector is still ongoing, and while we may see improvements, it is imperative to approach this technology with caution, awareness, and a keen sense of responsibility.
