The Missing Piece in AI Content: The Human Touch

A Linguistic Analysis Learn the Key Factors that Distinguish Natural and Artificial Language Generation

We writers can tap into the emotions of our readers by using pathos, incidents, and many different ways. We can craft sentences and paragraphs that evoke emotions to create resonating echoes with our connecting readers. AI language models like Chatgpt may still struggle to reciprocate that level of emotional resonance the way we do. As a consequence, the content becomes less engaging unless someone is biased toward AI-generated content explicitly.

Artificial Intelligence was mainly built with the intention of just mimicking our IQ; not EQ.

I am writing this article with the intention to create awareness along with finding insights into what I have experienced so far to distinguish between AI content and Human content, just not depending on the AI tools; which sometimes misguide us.

Some Linguistic factors can help us to understand the difference between Human intelligence and Sytnethic intelligence while it's written.

Major Linguistic factors:

Perplexity: This is a measure of how well a language model can predict the next word in a sequence of words. Human-written text tends to have higher perplexity than machine-generated text, as it is more complex and unpredictable. Machine-generated text may have lower perplexity, as it is more likely to follow common patterns and rules.

Example:

Human-written: The quick brown fox jumps over the lazy dog.
Machine-generated: The fast brown animal goes over the slow canine.

In the below, I have done deep dive into Preplexity and proven it statistically with real-time practice.

How to Build an AI Content Detector from Scratch with Python

A Practical and Fun Project for AI Enthusiasts

levelup.gitconnected.com

Burstiness: This is another linguistic factor that measure of how varied and diverse the sentences are within a piece of text. Human-written text tends to have higher burstiness than machine-generated text, as it is more creative and spontaneous.

Machine-generated texts consist of regular and uniform sentences.

Example:

Human-written: I love to travel and explore new places across the planet. There are tons of things to experience and learn from this world. Every destination spot has its own charm and culture. I enjoy meeting new people and trying new cuisines whenever I go. Traveling exuberates my mind and soul all the time.
Machine-generated: I like to travel and see new places. There are many things to see and learn in this world. Every place has its own beauty and culture. I like to talk to new people and eat new foods. Traveling makes me happy and smart.

Sometimes we are inclined towards using our vocabularies and diversified phrases too much due to our habits. This factor sometimes brings little inconsistencies while we write. Yes! I know these are mistakes often by us, but that’s where the beauty lies in human writing.

Repetition: This is a measure of how often words or phrases are repeated within a piece of text. Machine-generated text may have more repetition than human-written text, as it may lack the ability to avoid or resolve redundancy.

Example:

Human-written: Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?”
Machine-generated: Alice was getting very bored of sitting by her sister on the bank, and of having nothing to do: she had looked into the book her sister was reading a few times, but it had no pictures or conversations in it, “and what is the point of a book,” Alice thought “without pictures or conversations?”

Note: The machine-generated paragraph has more repetition than the human-written paragraph, as it uses the same words or phrases multiple times (e.g., “very”, “sister”, “bank”, “nothing to do”, “book”, “pictures or conversations”). The human-written paragraph has less repetition, as it uses synonyms, modifiers, or other devices to avoid or reduce repetition (e.g., “beginning”, “tired”, “peeped”, “once or twice”, “use”).

Cohesion: Human brain generates texts in a much more logical way when it comes to connecting them to bring context to one place.

Example:
Human-written: A solar eclipse is a type of eclipse that occurs when the Moon passes between the Sun and Earth, and the Moon fully or partially blocks (“occults”) the Sun. This can happen only at new moon when the Sun and the Moon are in conjunction as seen from Earth in an alignment referred to as syzygy. In a total eclipse, the disk of the Sun is fully obscured by the Moon. In partial and annular eclipses, only part of the Sun is obscured.
Machine-generated: A solar eclipse is when the Moon comes between the Sun and Earth, and the Moon covers some or all of the Sun. This can only happen at new moon when the Sun and the Moon are close together from Earth’s point of view. In a total eclipse, the Sun is completely hidden by the Moon. In partial and ring-shaped eclipses, some of the Sun is still visible.

Analysis: The human-written paragraph has higher coherence than the machine-generated paragraph, as it uses more precise and consistent terms (e.g., “occults”, “conjunction”, “syzygy”, “disk”, “obscured”, “annular”). The machine-generated paragraph has lower coherence, as it uses more vague and lesser relevant terms (e.g., “comes between”, “covers”, “close together”, “hidden”, “ring-shaped”, “still visible”).

Human-written text tends to have higher coherence than machine-generated text, as it is more likely to have a clear structure, logic, and flow.

Additional Factors:

Voice: AI models majority of the time use passive voice; when on the other hand we humans prefer to use active voice Active voices.

Length of texts:

We human beings can tune the length of the paragram when we write. Because our conscience focuses more on the context of our articulation of what we try to express.

But, AI models always will shoot lengthy stuff; it doesn’t matter what kind of prompts you give due to its parameter tuning at the time of the training process. So, it tries to contextualize the whole thing up to the exact number of tokens(‘words’) it was trained on; not as per requirement during writing (when it is trained ).

Conversational style:

Many times we use sarcasm and try to write a conversational style like the reader can feel during reading that someone is talking to him or her.AI is incapable to do this thing.

Now, I think we have learned the hypothetical part suffice enough. Let’s jump into practicals now.

Developing payload prompt to exploit GPT detection:

First, we will create a prompt to include a couple of features of humans to enforce AI or GPT to mimic human writing styles. It’s an experiment to attempt to break the thin line between AI text and human text.

The prompt with all the conditions to evade AI detection

The generated text is below:

As I always say, "Health is wealth!" Think about it, folks - without our health, we're nothing more than a shell of our former selves. We can't enjoy life's adventures, indulge in delicious foods, or even cuddle with our loved ones. Our bodies are our temples, and we must treat them with the utmost care and respect. When we prioritize our health, we open ourselves up to a world of possibilities. We can hike mountains, dance the night away, and tackle challenges head-on. Heck, we can even eat that extra slice of pizza guilt-free! So, let's all take a moment to appreciate our bodies and promise to take care of them, shall we? After all, a healthy body and mind lead to a wealthy life, filled with experiences and memories that truly matter.

Detection results:

Now, let's see how will it be without trying to mimic human style.

Generic prompt to generate text without condition

The generated text is below:

Health is indeed wealth! Taking care of our physical, mental, and emotional well-being is essential for leading a happy and fulfilling life. When we have good health, we are able to enjoy life's experiences and pursue our goals with energy and enthusiasm. Conversely, poor health can limit our ability to engage in activities we love and can even impact our relationships and overall quality of life. That's why it's important to prioritize self-care, eat nutritious foods, exercise regularly, get enough sleep, and manage stress. By taking proactive steps towards maintaining good health, we can build resilience, increase productivity, and foster a positive outlook on life. Remember, investing in our health is an investment in ourselves and our future!

Now, we can see things getting worse without mimicking our style.

Hence, we can conclude that up to a certain level, we can evade AI detection or at least increase the perplexity and Burstiness of the generated texts; if we design the prompt in a custom way with specific conditions.

Note: we used the Huggingface platform to use the meta-llama/Llama-2–70b-chat-hf GPT model for text generation and used AI detection tools: https://contentatscale.ai/ and https://www.zerogpt.com/. the result may vary with the same prompt if you try different GPT models and different detection models.

“Prompt engineering is not about hammering words together, but about sculpting them into art.” — By Author

Reference:

The Ultimate Tool for Market Research

Discover how Bing’s new feature powered by OpenAI’s GPT-4 can help you with various tasks and goals on Bing.com and…

medium.datadriveninvestor.com