avatarRonni Souers

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

8983

Abstract

curacy test</a> on their website, which provides some insight into just how accurate their test is. Detailed information about the study, including specific info about its materials, is missing, so we can only draw so many conclusions about how their results were reached.</p><p id="af04">I tested multiple writing samples using this software, putting the $20.00 I spent on my 2000 credits to good use. I would love to <i>directly </i>share the results of my tests with you (which is absolutely possible, <b>as shareable links are provided</b> — another positive UX feature), but Originality.AI’s <a href="https://originality.ai/terms-and-conditions">terms and conditions</a> prohibit me from doing so.</p><p id="8d17">For this test, I really wanted to test out some of my search-engine-optimized writing, because SEO writing sounds inherently robotic. SEO best writing practices privilege easy-to-read writing. In addition to active, shorter sentences that employ a middle-grade lexicon, the <a href="https://www.masterclass.com/articles/how-to-use-the-rule-of-three-in-writing">rule of three</a>, the heavy use of key phrases, and cliché wording are all commonplace in SEO writing. This is why ChatGPT is such an <i>excellent </i>SEO writer: it seems to be programmed to use all these practices in its responses.</p><p id="2a97">I applied for an SEO writing job long before I created a ChatGPT account. The company asked for two 350-word SEO writing samples. Here is the first one I created:</p><figure id="f3c8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*q3FLWbbK19aIWgg_8LLJ3A.png"><figcaption></figcaption></figure><p id="ff1d">As mentioned above, Originality.AI said that this article was 53% likely to be AI-generated. <i>I swear on my life that I wrote this article myself.</i></p><p id="7c0b">The next SEO article I tested was another article I wrote for the same job application:</p><figure id="d8e2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*kTAYOTZ31SrC2P_fQEYekQ.png"><figcaption></figcaption></figure><p id="21fc">This article was flagged as — not kidding — <i>100% likely to be AI-generated</i>. This is another article I <i>promise</i> is my original work. The feat took me longer than I would have liked. Google Ads’ Keyword Planner helped me determine the best key phrases to use in the article based on competition and search volume, and I wrote the article around those. (Now I feel like a robot.)</p><p id="ca02">These were just<b> two</b> of the <b>30</b> pieces of content I scanned. Twenty-one of these were completely human generated. Six of them were AI generated with some or no human editing. The remaining three were what I consider to be “AI assisted.” The content tested comprised a mix of SEO blog writing (18), non-SEO blog writing (2), website copy (6), articles (2), and some fiction (2).</p><figure id="e9e1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*YU0pdl3U_X9UXUQmeloIvQ.png"><figcaption>The results of my test of Originality.AI’s accuracy</figcaption></figure><p id="bd8f">Out of the 21 human-written pieces I tested, four were falsely accused of being AI generated (false positives), while the remaining seventeen received varying “likely to be human generated” scores (true negatives).</p><p id="ddd1">Out of the six minimally edited or non-edited ChatGPT generated pieces I tested, all were rightfully accused of being AI generated (true positives).</p><p id="32ce">The three AI-assisted pieces I tested all scored higher on the “human generated” side.</p><p id="54a4">I would consider two of these to be true negatives and one of these to be false negatives, and here’s why: For two of these documents, while I did receive <i>some </i>help from ChatGPT, I still did most of the work, so they really <i>were </i>more human generated than AI generated. I consider these true negatives.</p><p id="704a">For the last document, ChatGPT did most of the work (probably 60%), but I successfully fooled Originality.AI into thinking I had done more work (that content received a 53% human score, so I just barely fooled it). That was much more difficult than I expected it to be!</p><p id="e3b8">I tried the Jasper Whisperer’s <a href="https://readmedium.com/1-perfect-prompt-to-get-ai-to-write-like-you-8c7c36abd652">voice prompting hack</a> to generate a blog post and a fiction chapter based on the styles of human writers, which <i>appeared </i>to do a good job, yet Originality.AI’s detector correctly flagged each piece of produced content as being 100% AI generated.</p><p id="1c08">I had read on Reddit that Originality.AI’s detector could easily be fooled by the simple inclusion of grammar errors. If this was ever true, the software developers have fixed this. My tests revealed that inserting various grammar errors (subtle or overt errors) does <i>not </i>fool the AI.</p><p id="1ac6">However, I wonder if whatever fix they applied has overcompensated for the grammar errors by turning frequent grammar errors into a red flag indicating that the text is AI generated. Remember when I told you under “Ease of Use” that when I imported some of the writing from my blog, the import contained missing apostrophes and spaces? Somehow, that import received of a score of 37% likely to be AI-generated compared to the <i>same </i>blog post copied and pasted directly into the text field (which scored 13%). Why did <i>the exact same blog post with the omission of apostrophes and spaces</i> get a higher AI-generation-likelihood score?</p><p id="4792">Here’s the thing: <b>I think that this software is actually quite accurate when it comes to detecting AI <i>in</i> AI-generated texts. </b>If I were to exclusively test AI-generated texts, I think it would detect AI in almost all of them, and any false negatives would have had to have undergone <i>heavy </i>editing.<b> However, the potential for false positives — particularly in content that adheres closely to SEO best writing practices — reduces its accuracy rate.</b></p><p id="2eae">Currently, Originality.AI says that its tests have revealed that its AI “incorrectly identifies human generated text as AI generated 1.56% of the time.” That percentage might not seem like a lot, but in reality, it <i>is</i>. That’s 15–16 texts out of every 1,000 texts that are being falsely accused of using AI.</p><p id="205e">I also suspect that that percentage — 1.56 — is highly manipulatable depending on the type of writing tested. My fiction writing came back as 100% human generated (which it was), which makes sense, since I try to use my unique writing quirks for my fiction writing. My SEO writing, as shown above, was falsely accused of being AI-generated, which I think is due to its close adherence to SEO best writing practices (my voice is practically nonexistent in that piece).</p><p id="4681">I should emphasize: <b>my test was <i>not</i> an extensive test</b>. It was a <b>small</b> test, done somewhat haphazardly, and the writing samples were primarily my own or heavily edited by me. My own SEO writing samples are not representative of all SEO writing. Were I given funding to test Originality.AI or other AI detectors, I would design the test to be much larger and more comprehensive, I would aim for representative samples, and I would provide detailed information on the samples tested.</p><p id="0ab0">But even so, 19% of my human-generated samples were falsely accused of being AI generated. 22% of my SEO samples comprised false positives. (See how easy it is to manipulate the percentage?)</p><p id="b223">Furthermore, even amongst my writing samples that received higher human scores, there were still sentences or sections highlighted as being likely AI generated. In a case where an employer is using the detector to confirm writers aren’t using AI, the employer might still punish writers or require them to revise highlighted sentences/sections simply because they’ve been flagged <i>even when the content, as a whole, has a higher human score</i>.</p><p id="4098">Originality.AI does admit, on various pages of their website (including <a href="https://originality.ai/blog/ai-content-detector-false-positives">this one</a>), that false positives are possible and that they are committed to helping writers who are falsely accused of using AI. Their FAQs section (on the bottom of <a href="https://originality.ai/">their main page</a>) says, “False positives do occur and can cause a lot of pain. Across hundreds of thousands of tests we currently see false positives occurring about 2% of the time that human work is submitted. 2% false positives, despite being the lowest in the industry based on <a href="https://originality.ai/ai-content-detection-accuracy/">our accuracy test</a>, is too high. The AI researchers and machine learning engineers at Originality are working constantly to both improve detection accuracy and reduce false positives.” Good for them!</p><p id="71d0">But the fa

Options

ct remains: a false positive can be devastating for a writer, damaging their confidence, threatening their employment, and requiring them to jump through extra hoops to prove the writing is original or to rewrite the content for a better human score.</p><h1 id="c1db">Ethics in marketing</h1><p id="0eb0">As mentioned above, Originality.AI’s website has recently undergone a glow up, and their website copy is much more ethical than it used to be, which I applaud them for. In my first article about Originality.AI, I discussed how their website copy contained manipulative, fear-based, straw-man rhetoric that would make readers think Google considers all AI-generated content as spam (which Google currently doesn’t). That rhetoric has since been deleted. (It can still be seen through the <a href="https://web.archive.org/web/20230221071925/https:/originality.ai/">Wayback Machine</a>, though.)</p><p id="d024">Originality.AI’s marketing is all based on the <i>necessity </i>of their services: to market their services, they must make potential customers believe the services are necessary.</p><p id="0c6d">To this end, company still asserts their idea that Google <i>might </i>punish AI-generated text, and they have published their own <a href="https://originality.ai/blog/will-google-penalize-ai-content">in-depth correlation study</a> to support this idea. While they emphasize that correlation does not necessarily equal causation, their study examines the possible influence of AI-generated content on Google search rankings. They analyzed the top 20 webpages for 1,000 popular keywords and used their AI to identify AI-generated content. Their aim was to answer two questions: whether Google can detect AI content and whether Google penalizes AI content. Their key findings:</p><ol><li>Websites with higher human scores have better Google search rankings, as determined by their AI detector. (Remember that the AI detector <i>isn’t always accurate</i>.)</li><li>A 1% increase in the human content score corresponds to an improvement of 2.65 positions in Google ranking.</li><li>The correlation between human content scores and Google ranking weakens above 75%. Scores above this threshold offer marginal benefits.</li></ol><p id="9208">As is the case with their AI accuracy study, Originality.AI have only published a <i>summary </i>of this correlation study — not the full study. They don’t share the specific keywords or webpages they studied, and they don’t share other potentially important information. Such information might indicate limitations or flaws in the study.</p><p id="13c2">For example, <i>well-established</i> websites often rank higher in search engines, which means well-established webpages hosted by those sites are likely older. Publicly available AI text generation is recent, so we might safely assume that the higher-ranking, well-established webpages don’t use AI <i>because large language models like ChatGPT weren’t yet available for public use at the time those webpages were written</i>. This idea could, of course, be disproven, but Originality.AI would need to release their full study to disprove it.</p><p id="eb57">The company also carve out the need for their software based on the rhetoric of “reputation,” promoting the idea that content publishers could “risk their reputation” by publishing AI-generated content, which might sometimes be true, given that there are many who don’t like AI and want nothing to do with it.</p><p id="fbeb">They have also added text that specifically addresses the situation of the writer who’s been falsely accused of using AI by their software, and they provide guidance for agencies on how to deal with AI scores. The guidance says that agencies should have no hard rules about AI thresholds since the detectors can produce varying AI-likelihood scores, including false positives. They also say that agencies shouldn’t immediately fire writers who have been flagged for using AI and instead take a holistic look at that writer’s work to determine if the use of AI is a problem across work. Good guidance!</p><p id="9bf8">Still, as mentioned above, they do create AI-generation “targets” for companies to meet, and their in-depth correlation study will convince customers to try to meet those targets. Since the study says that higher human content scores (up to 75%) greatly increase Google rankings, some companies will always want to aim for that 75% human content score, for which they will <i>need </i>to use the AI detector.</p><p id="203d">The company is now creating more tools to help writers prove their content’s originality. The marketing for <i>these</i> products will make writers feel like Originality.AI is <i>on their side</i>. Which is great — except the demand for such products wouldn’t exist if AI detectors didn’t exist. Currently, many of these products are free for writers to use, but that may change in the future, especially if/when “proving human generation” becomes a widely accepted and mandated feat of content creation.</p><p id="234f">They kind of have the perfect business plan, wherein the widespread adoption of their flagship service generates demand for their other services.</p><p id="11eb">Overall, <b>Originality.AI’s software is cheap and easy to use, although its understandability could use some work. Its accuracy is better than I thought but still questionable regarding false positives. Originality.IA’s marketing has gotten much more ethical.</b></p><p id="3296">But is the software <b>necessary</b>?</p><p id="f0d8">It might be true that in the future, Google starts punishing AI-generated content. Google might already be doing this now, even though this goes against their current <a href="https://developers.google.com/search/blog/2023/02/google-search-and-ai-content">published stance on AI</a>.</p><p id="9cf1">Personally, I don’t think this is the case. (I could be wrong, of course.) I think AI-<i>assisted </i>content is already becoming very mainstream in our productivity-driven society where most writers are already underpaid and need ways to improve their productivity.</p><p id="038a">I feel bad for writers who are falsely accused of using AI and are either fired or must spend more unpaid time “fixing” (humanizing) their writing. I resent the fact that some writers are beginning to pay for AI detection services to use <i>on their own writing </i>simply to make their writing “passable” or resorting to other measures to prove the originality of their writing.</p><p id="21a3">I also don’t understand how writers are supposed to both adhere to SEO best writing practices <i>and </i>pass AI-detection tests, particularly when the use of SEO best writing practices makes the AI detector light up like a Christmas tree. I think writers using SEO software will have even more difficulty with this. Perhaps to combat this concern (or to capitalize off it), Originality.AI does have its own <a href="https://originality.ai/content-optimizer">content optimizer</a> (an SEO software “alternative”) that it claims will optimize content AND <i>reduce </i>the chances of a false positive occurring. Hmmm.</p><p id="c6db">Ugh, becoming a writer has gotten so complicated. Design! Optimize! Humanize! Software, software, software!</p><p id="a8c2">Sure, writers claiming to have written content they didn’t write is unethical and sucky, as is writers using AI despite their employers’ wishes. But must we all suffer so that <i>some</i> of those writers caught, even if that means <i>some of us</i> are falsely accused in the process? Wholesale AI-generated content is, in my opinion, easy to spot (but it will become less easy to spot as AI improves). If writing sounds overly formulaic and robotic, and that goes against a company’s/client’s stylistic guidelines, can’t the company just ask the writer to make it sound less robotic <i>without </i>the need of software?</p><p id="ef85">Is the content creation industry just needlessly adding more hoops that we all must jump through (and pay for)?</p><p id="f196">And as AI text generators learn to evade AI detection, how complicated will this hoop-jumping get as writers battle with AI to sound “more human”?</p><p id="0bfe"><i>*Note: I emailed Originality.AI’s support to ask for access to their full tests (including both the study that tests their AI detector’s accuracy and the correlation study), specifically the designs + methodologies. They told me: “We are working with some academics to complete follow up studies that will likely provide the information you are looking for. The majority of the human content we selected [for our accuracy study] was top performing pages in google where we could verify the publishing date predated AI.” It could be the case that, because their studies were done internally and never intended for publication, they don’t have the detailed information ready to hand over. I look forward to seeing the academic follow-up studies.</i></p></article></body>

I Tested Originality.AI’s AI Detector

My honest review on the service

Picture made by the author

Companies that provide writing services — including content mills and marketing companies — are among the businesses currently using AI-detection software to test the contents created by their employed writers to see if those contents have been generated by AI. Originality.AI offers one of the most popular choices of AI-detection software, as they promote their software as being the “most accurate AI & plagiarism detector for serious content publishers.”

After reading various stories of writers either getting fired after being falsely accused of using AI or having to jump through lots of extra hoops to get their writing to a stage where it passes AI-detection software, I decided to test Originality.AI’s software myself. (You can read more about my initial thoughts on the AI detection phenomenon here.)

I wanted to make a full review, so I decided to use the following criteria for evaluating this software: cost, ease of use, understandability, accuracy, and marketing ethics. Please keep in mind that, at the time of writing, Originality.AI’s software tests for three factors: AI generation, plagiarism, and readability. This review is going to focus only on the software’s AI-detection capabilities. Also, for the test, I use only ChatGPT-generated articles (using the free May 24 version of ChatGPT), and I use Originality.AI 1.4.

Cost

At $0.01 per 100 words checked, Originality.AI seems cheap. You must pay for credits up front, so I paid $20 for 2000 credits.

Content publishers, of course, might rack up a hefty bill if they are scanning hundreds or thousands of blog posts, but even for them, the cost probably isn’t that terrible. On the flip side, the low cost might incentivize more companies to use the software, which may, in turn, cause more writers to need to pay for the software.

Yes, you read that right: some writers are now using this software to determine whether their writing is “passable” before submitting it to their employers (either independently or at the request of employers). For the writers paying out of their own pockets, the cost hopefully won’t break their banks.

Still, the added costs to writers who now must pass AI detection tests is one many of them are not being compensated for. This includes the actual cost of purchasing AI detection, plus the time they must spend testing their own writing and potentially rewriting content that has been flagged as AI generated. It also includes the time wasted (and money lost) if the employer will no longer accept the piece because it’s been flagged as being “likely AI generated.” (Diona L. Reeves has written a fabulous article about how AI detection could diminish the writing pool here.)

So, there is more to the “cost” of AI-detection software than just the cost, if you know what I mean.

Ease of use

The software is pretty easy to use. You simply click “Start New Scan,” paste the content you want scanned into the text field, and wait for the results.

The tests are not instantaneous. I tested only short pieces of writing, and I don’t think any scan ever took longer than 30 seconds. Because of this, I can’t speak to how long lengthier pieces of content take to scan.

There is a feature that allows users to post content directly from the web simply by inputting the URL, which could make things easier. I found that this feature did not work well; in fact, there were times when it did not work at all. Sometimes, when the feature did work, the content uploaded into the text box contained weird formatting; in one case, when I imported a blog post from my blog, all the apostrophe marks and some of the spaces had been omitted (which was funny since the blog was about how to use apostrophes).

All this being said: Originality.AI does seem to be on top of their game (their website has recently undergone a massive glow up), so I’m sure its ease of use, and its user experience in general, will improve over time.

Understandability

The scoring is confusing, even if you take the time to read how it works. First, let’s talk about what the website does tell us.

The website has an entire article devoted to explaining the AI score, and this article is hyperlinked under the scores of every content scan for easy accessibility.

The article explains, among other thing, that the scores refer to probability. A score of 70% AI and 30% human means that there is a 70% probability that the text was created by AI and a 30% probability that the text was created by a human.

It also provides targets for people who are creating or using content. For example, businesses who don’t want to use any AI content need to aim for human scores averaging 90% or higher (and 65% should be the minimum acceptable human score). Businesses who are fine with some AI-generated content have different average and minimum target human scores to aim for (for more details, see the above-linked article).

Furthermore, a new highlighting feature means users can see the different sections of their writing and how the AI is scoring those sections. Different parts of the same document may contain different AI-likelihood scores.

The page containing users’ content scans also includes a highlighting key so users can make sense of the highlights:

Highlighting Color Key

RED = 90% confidence this sentence was generated by AI.

ORANGE = 70% confidence this sentence was generated by AI.

YELLOW = 50% confidence this sentence was generated by AI.

YELLOW GREEN = 70% confidence this sentence was human written.

GREEN = 90% confidence this sentence was human written.

When I tested my self-written SEO article “Top 3 Things to Do in Nashville” (shown below), my scan displayed a combination of red and orange highlighting. The first section of the article was flagged as “100% chance that this section was AI generated.” The second was 72% likely to be AI-generated, and the third was 96%. However, the overall score for the piece was 53% likely to be AI generated.

How does that work out? I really don’t know. I would assume the detector would take the averages of all the highlighted sections (dependent on the lengths of the sections). But it’s clearly doing something more complicated, and I couldn’t find an explanation for how that score was reached.

For this reason, I would say the understandability is so-so.

Based on my content scans, I suspect that the AI looks at the combination of inconsistent percentages, somehow determining that inconsistent AI-likelihood percentages indicate the text is less likely to be AI-generated. This is just conjecture; Originality.AI’s website does not appear to explain this.

(I should note that Originality.AI does have an article that explains how the detector actually works, but the article is quite dense.)

In a perfect world, every content publisher would pore over the details of the scoring to understand how it works before using it to manage their writers. But we don’t live in a perfect world, and I’m guessing that various content publishers who use this software don’t understand the scoring and haven’t even tried — which means that they could automatically assume that every sentence the software flags as being AI-generated is and/or must be rewritten. I really resent that.

Accuracy

Originality.AI has published a summary of their accuracy test on their website, which provides some insight into just how accurate their test is. Detailed information about the study, including specific info about its materials, is missing, so we can only draw so many conclusions about how their results were reached.

I tested multiple writing samples using this software, putting the $20.00 I spent on my 2000 credits to good use. I would love to directly share the results of my tests with you (which is absolutely possible, as shareable links are provided — another positive UX feature), but Originality.AI’s terms and conditions prohibit me from doing so.

For this test, I really wanted to test out some of my search-engine-optimized writing, because SEO writing sounds inherently robotic. SEO best writing practices privilege easy-to-read writing. In addition to active, shorter sentences that employ a middle-grade lexicon, the rule of three, the heavy use of key phrases, and cliché wording are all commonplace in SEO writing. This is why ChatGPT is such an excellent SEO writer: it seems to be programmed to use all these practices in its responses.

I applied for an SEO writing job long before I created a ChatGPT account. The company asked for two 350-word SEO writing samples. Here is the first one I created:

As mentioned above, Originality.AI said that this article was 53% likely to be AI-generated. I swear on my life that I wrote this article myself.

The next SEO article I tested was another article I wrote for the same job application:

This article was flagged as — not kidding — 100% likely to be AI-generated. This is another article I promise is my original work. The feat took me longer than I would have liked. Google Ads’ Keyword Planner helped me determine the best key phrases to use in the article based on competition and search volume, and I wrote the article around those. (Now I feel like a robot.)

These were just two of the 30 pieces of content I scanned. Twenty-one of these were completely human generated. Six of them were AI generated with some or no human editing. The remaining three were what I consider to be “AI assisted.” The content tested comprised a mix of SEO blog writing (18), non-SEO blog writing (2), website copy (6), articles (2), and some fiction (2).

The results of my test of Originality.AI’s accuracy

Out of the 21 human-written pieces I tested, four were falsely accused of being AI generated (false positives), while the remaining seventeen received varying “likely to be human generated” scores (true negatives).

Out of the six minimally edited or non-edited ChatGPT generated pieces I tested, all were rightfully accused of being AI generated (true positives).

The three AI-assisted pieces I tested all scored higher on the “human generated” side.

I would consider two of these to be true negatives and one of these to be false negatives, and here’s why: For two of these documents, while I did receive some help from ChatGPT, I still did most of the work, so they really were more human generated than AI generated. I consider these true negatives.

For the last document, ChatGPT did most of the work (probably 60%), but I successfully fooled Originality.AI into thinking I had done more work (that content received a 53% human score, so I just barely fooled it). That was much more difficult than I expected it to be!

I tried the Jasper Whisperer’s voice prompting hack to generate a blog post and a fiction chapter based on the styles of human writers, which appeared to do a good job, yet Originality.AI’s detector correctly flagged each piece of produced content as being 100% AI generated.

I had read on Reddit that Originality.AI’s detector could easily be fooled by the simple inclusion of grammar errors. If this was ever true, the software developers have fixed this. My tests revealed that inserting various grammar errors (subtle or overt errors) does not fool the AI.

However, I wonder if whatever fix they applied has overcompensated for the grammar errors by turning frequent grammar errors into a red flag indicating that the text is AI generated. Remember when I told you under “Ease of Use” that when I imported some of the writing from my blog, the import contained missing apostrophes and spaces? Somehow, that import received of a score of 37% likely to be AI-generated compared to the same blog post copied and pasted directly into the text field (which scored 13%). Why did the exact same blog post with the omission of apostrophes and spaces get a higher AI-generation-likelihood score?

Here’s the thing: I think that this software is actually quite accurate when it comes to detecting AI in AI-generated texts. If I were to exclusively test AI-generated texts, I think it would detect AI in almost all of them, and any false negatives would have had to have undergone heavy editing. However, the potential for false positives — particularly in content that adheres closely to SEO best writing practices — reduces its accuracy rate.

Currently, Originality.AI says that its tests have revealed that its AI “incorrectly identifies human generated text as AI generated 1.56% of the time.” That percentage might not seem like a lot, but in reality, it is. That’s 15–16 texts out of every 1,000 texts that are being falsely accused of using AI.

I also suspect that that percentage — 1.56 — is highly manipulatable depending on the type of writing tested. My fiction writing came back as 100% human generated (which it was), which makes sense, since I try to use my unique writing quirks for my fiction writing. My SEO writing, as shown above, was falsely accused of being AI-generated, which I think is due to its close adherence to SEO best writing practices (my voice is practically nonexistent in that piece).

I should emphasize: my test was not an extensive test. It was a small test, done somewhat haphazardly, and the writing samples were primarily my own or heavily edited by me. My own SEO writing samples are not representative of all SEO writing. Were I given funding to test Originality.AI or other AI detectors, I would design the test to be much larger and more comprehensive, I would aim for representative samples, and I would provide detailed information on the samples tested.

But even so, 19% of my human-generated samples were falsely accused of being AI generated. 22% of my SEO samples comprised false positives. (See how easy it is to manipulate the percentage?)

Furthermore, even amongst my writing samples that received higher human scores, there were still sentences or sections highlighted as being likely AI generated. In a case where an employer is using the detector to confirm writers aren’t using AI, the employer might still punish writers or require them to revise highlighted sentences/sections simply because they’ve been flagged even when the content, as a whole, has a higher human score.

Originality.AI does admit, on various pages of their website (including this one), that false positives are possible and that they are committed to helping writers who are falsely accused of using AI. Their FAQs section (on the bottom of their main page) says, “False positives do occur and can cause a lot of pain. Across hundreds of thousands of tests we currently see false positives occurring about 2% of the time that human work is submitted. 2% false positives, despite being the lowest in the industry based on our accuracy test, is too high. The AI researchers and machine learning engineers at Originality are working constantly to both improve detection accuracy and reduce false positives.” Good for them!

But the fact remains: a false positive can be devastating for a writer, damaging their confidence, threatening their employment, and requiring them to jump through extra hoops to prove the writing is original or to rewrite the content for a better human score.

Ethics in marketing

As mentioned above, Originality.AI’s website has recently undergone a glow up, and their website copy is much more ethical than it used to be, which I applaud them for. In my first article about Originality.AI, I discussed how their website copy contained manipulative, fear-based, straw-man rhetoric that would make readers think Google considers all AI-generated content as spam (which Google currently doesn’t). That rhetoric has since been deleted. (It can still be seen through the Wayback Machine, though.)

Originality.AI’s marketing is all based on the necessity of their services: to market their services, they must make potential customers believe the services are necessary.

To this end, company still asserts their idea that Google might punish AI-generated text, and they have published their own in-depth correlation study to support this idea. While they emphasize that correlation does not necessarily equal causation, their study examines the possible influence of AI-generated content on Google search rankings. They analyzed the top 20 webpages for 1,000 popular keywords and used their AI to identify AI-generated content. Their aim was to answer two questions: whether Google can detect AI content and whether Google penalizes AI content. Their key findings:

  1. Websites with higher human scores have better Google search rankings, as determined by their AI detector. (Remember that the AI detector isn’t always accurate.)
  2. A 1% increase in the human content score corresponds to an improvement of 2.65 positions in Google ranking.
  3. The correlation between human content scores and Google ranking weakens above 75%. Scores above this threshold offer marginal benefits.

As is the case with their AI accuracy study, Originality.AI have only published a summary of this correlation study — not the full study. They don’t share the specific keywords or webpages they studied, and they don’t share other potentially important information. Such information might indicate limitations or flaws in the study.

For example, well-established websites often rank higher in search engines, which means well-established webpages hosted by those sites are likely older. Publicly available AI text generation is recent, so we might safely assume that the higher-ranking, well-established webpages don’t use AI because large language models like ChatGPT weren’t yet available for public use at the time those webpages were written. This idea could, of course, be disproven, but Originality.AI would need to release their full study to disprove it.

The company also carve out the need for their software based on the rhetoric of “reputation,” promoting the idea that content publishers could “risk their reputation” by publishing AI-generated content, which might sometimes be true, given that there are many who don’t like AI and want nothing to do with it.

They have also added text that specifically addresses the situation of the writer who’s been falsely accused of using AI by their software, and they provide guidance for agencies on how to deal with AI scores. The guidance says that agencies should have no hard rules about AI thresholds since the detectors can produce varying AI-likelihood scores, including false positives. They also say that agencies shouldn’t immediately fire writers who have been flagged for using AI and instead take a holistic look at that writer’s work to determine if the use of AI is a problem across work. Good guidance!

Still, as mentioned above, they do create AI-generation “targets” for companies to meet, and their in-depth correlation study will convince customers to try to meet those targets. Since the study says that higher human content scores (up to 75%) greatly increase Google rankings, some companies will always want to aim for that 75% human content score, for which they will need to use the AI detector.

The company is now creating more tools to help writers prove their content’s originality. The marketing for these products will make writers feel like Originality.AI is on their side. Which is great — except the demand for such products wouldn’t exist if AI detectors didn’t exist. Currently, many of these products are free for writers to use, but that may change in the future, especially if/when “proving human generation” becomes a widely accepted and mandated feat of content creation.

They kind of have the perfect business plan, wherein the widespread adoption of their flagship service generates demand for their other services.

Overall, Originality.AI’s software is cheap and easy to use, although its understandability could use some work. Its accuracy is better than I thought but still questionable regarding false positives. Originality.IA’s marketing has gotten much more ethical.

But is the software necessary?

It might be true that in the future, Google starts punishing AI-generated content. Google might already be doing this now, even though this goes against their current published stance on AI.

Personally, I don’t think this is the case. (I could be wrong, of course.) I think AI-assisted content is already becoming very mainstream in our productivity-driven society where most writers are already underpaid and need ways to improve their productivity.

I feel bad for writers who are falsely accused of using AI and are either fired or must spend more unpaid time “fixing” (humanizing) their writing. I resent the fact that some writers are beginning to pay for AI detection services to use on their own writing simply to make their writing “passable” or resorting to other measures to prove the originality of their writing.

I also don’t understand how writers are supposed to both adhere to SEO best writing practices and pass AI-detection tests, particularly when the use of SEO best writing practices makes the AI detector light up like a Christmas tree. I think writers using SEO software will have even more difficulty with this. Perhaps to combat this concern (or to capitalize off it), Originality.AI does have its own content optimizer (an SEO software “alternative”) that it claims will optimize content AND reduce the chances of a false positive occurring. Hmmm.

Ugh, becoming a writer has gotten so complicated. Design! Optimize! Humanize! Software, software, software!

Sure, writers claiming to have written content they didn’t write is unethical and sucky, as is writers using AI despite their employers’ wishes. But must we all suffer so that some of those writers caught, even if that means some of us are falsely accused in the process? Wholesale AI-generated content is, in my opinion, easy to spot (but it will become less easy to spot as AI improves). If writing sounds overly formulaic and robotic, and that goes against a company’s/client’s stylistic guidelines, can’t the company just ask the writer to make it sound less robotic without the need of software?

Is the content creation industry just needlessly adding more hoops that we all must jump through (and pay for)?

And as AI text generators learn to evade AI detection, how complicated will this hoop-jumping get as writers battle with AI to sound “more human”?

*Note: I emailed Originality.AI’s support to ask for access to their full tests (including both the study that tests their AI detector’s accuracy and the correlation study), specifically the designs + methodologies. They told me: “We are working with some academics to complete follow up studies that will likely provide the information you are looking for. The majority of the human content we selected [for our accuracy study] was top performing pages in google where we could verify the publishing date predated AI.” It could be the case that, because their studies were done internally and never intended for publication, they don’t have the detailed information ready to hand over. I look forward to seeing the academic follow-up studies.

Ai Detection
Large Language Models
Ai Writing
ChatGPT
AI
Recommended from ReadMedium