Is It Possible to Detect AI Generated Text? Detection Tool Analysis.

AI content is raising the bar for all forms of content generation. Is it possible to reliably detect AI generated text? If not, then why, and what are the implications?

Is It Possible to Detect AI Generated Text? Detection Tool Analysis.
A chase between AI generation and AI detection algorithms / Generated with DALL-E

AI content is everywhere, often times even in places we do not expect, nor want it. School essays, product reviews, even scientific papers.

In many of these instances we want to see only original, true, human written content. If you have worked with AI content for a long time you might be able to recognize a familiar structure, or content pattern. Let's delve in and demystify! 👀😦

However a naked eye test is unlikely to be reliable, and definitively not something that can be automated. But do we have any tools that are better?

In this article I want to explore a few key questions:

  1. Is it possible to reliably detect AI generated text? If not, then why?
  2. What are the current methods and tools for detecting AI content?
  3. Should we care that AI generation is advancing faster than detection?

Is it possible to reliably detect AI generated text?

OpenAI has admitted that it is impossible to reliably detect all AI-written text. However, it does not mean that you can not detect it with a reasonable success rate. And it also does not mean that trying to detect it is pointless.

A good detection rate for me would be something above 80%. Sure, there will be complex content where the detector raises a white flag but to be useful it has to be able to correctly classify 80% of the text I give it.

Otherwise it has the potential to do more harm than good. False positives can be just as dangerous as negatives. But why only 80%, what is so difficult about AI detection?

To answer this it is worth trying to understand the basics of how AI content detectors work.

AI content detectors are usually machine learning models that have been trained on huge datasets of content. Sounds familiar? Using techniques such as neural networks they can analyse complex patterns and learn from it. Learn to identify AI generated content from human generated content.

Hold on, wait. AI content generators like ChatGPT have been trained on human written content. And now we are trying to train AI detectors using AI written content which is supposed to be representing human written content. Feature image of this article was inspired by this AI digital chase.

So as AI generators get better at writing like humans it will be more and more impossible to correctly classify AI content. I think the challenge of AI detectors is quite clear now.

Detectors have to always try to keep up with advancements in AI generators. Training and tuning the models takes time, so AI detectors are always reacting to newest advancements.

Besides the fact that AI detectors are bound to be always catching up, the AI generators have the home court advantage in making it difficult to follow. Take a look at the following study (arXiv:2305.10847 [cs.CL])

The study proposes a new method to automatically generate models that can evade existing detection systems - Substitution-based In-Context example Optimization method (SICO).

In a nutshell, SICO iteratively substitutes words and sentences within in-context examples, aiming at providing high-quality demonstrations for LLMs to generate text that cannot be detected.

Focus in the study is on real life tasks - academic essay writing, open-ended question answering and business review generation. The examples show great success of improving the existing generation models to be completely undetectable.

Another study Testing of Detection Tools for AI-Generated Text (arXiv:2306.15666 [cs.CL]), aims to answer not only if the detection tools can reliably detect ChatGPT generated text but also test if manual editing or machine paraphrasing affect the detection results.

An interesting finding is that the performance significantly decreases when texts are modified by paraphrasing (or translation). This includes text that is paraphrased using AI. The accuracy was only 26% when an AI content was modified by AI.

There are other studies that come at the same conclusion - AI detectors are not reliable and can be fooled by using various techniques, for example, How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts (arXiv:2310.05095 [cs.CL]).

So as we can see science says that you can not reliably detect AI content. However there are countless services selling this exact feature. Often times with claims such as:

with a 97% accuracy rate, detects whether your text is human or AI written!

Are they completely ripping you off, or have they really been able to get ahead of the curve?

To find out I decided to make a little experiment and test the most popular AI Detection platforms. Disclaimer - I do not get paid or in any way benefit from these services. This is just a personal experiment for my own (and hopefully your) curiosity.

Some notes before we get into comparison:

  1. The texts used in analysis was in the range from 300-1000 characters (depending on what the tools allowed to input). Based on the theory detectors work better with longer texts.
  2. Human written text was written personally by me. Should I worry, if I get classified as an AI?
  3. The AI generated texts were entirely generated using OpenAIs gpt-3.5-turbo-1106.

If you are already familiar with the tools you can skip ahead to the conclusion.

GPTZero

GPTZero | The Trusted AI Detector for ChatGPT, GPT-4, & More
Covered by >100 media outlets, GPTZero is the most advanced AI detector for ChatGPT, GPT-4, Bard. Check up to 50000 characters for AI plagiarism in seconds.

I had varying success with this tool. It seems to be very cautious about the results. First I fed it entirely AI generated text which was marked likely to be AI. It was successfully marked as AI written, although the confidence level was only 51%.

Then, I used snippets of article fully written by me without any AI assistance.

GPTZero AI detection tool results
A snippet from https://agilemerchants.com/will-ai-images-distort-our-reality/

Then another part of the same human written article was marked likely to be written by AI.

GPTZero AI detection tool results
Flagged snippet from https://agilemerchants.com/will-ai-images-distort-our-reality/

With more text the result seems to be more accurate.

My conclusion is that if you have an article with both AI generated text and sprinkled in some human text the AI detector will give a low probability and the text will be marked as written by a human. At the same time fully generated AI articles are likely to be flagged as AI.

Originality.ai

Our Accurate AI Checker, Plagiarism Checker and Fact Checker Lets You Publish with Integrity
Originality AI Plagiarism and Fact Checker - Publish With Integrity
At Originality.ai we provide a complete toolset (AI checker, Plagiarism Checker, Fact Checker and Readability Checker) that helps Website Owners, Content Marketers, Writers, Publishers and any Copy Editor hit Publish with Integrity.

Using the same examples of text Originality.ai seems to give bolder predictions. My personally written text was marked as 80% human.

Human created text detected by Originality.ai
Human created text detected by Originality.ai

Then I used the AI article and got a 100% AI result.

AI created text detected by Originality.ai
AI created text detected by Originality.ai

When testing mixed content, I was not as impressed. Combining the above 2 examples, 100% AI content with human created content I got a meek 87% original score. There was approximately same amount of AI and human generated content.

So once again, if you are able to mix AI with your own touch, then Originality.ai will give you the benefit of the doubt.

Copyleaks

AI Content Detector | AI Detector | ChatGPT Detector - Copyleaks
The Copyleaks AI Content Detector helps you know if what you’re reading was written by a person or generated by AI, including ChatGPT.

My content had a hard time in this tool. Human generated text got AI content detected throughout the whole text.

The last 2 lines trigger AI content detector.
The last 2 lines trigger AI content detector.

It also does not give you any additional information, or the confidence level of the result. I tried multiple articles of mine and most of it was marked correctly. However the tool does not give much confidence in the results.

ZeroGPT

AI Detector - Accurate Chat GPT, GPT4 & AI Text Checker Tool
Detect chatGPT content for Free, simple way & High accuracy. OpenAI detection tool, ai essay detector for teacher. Plagiarism detector for AI generated text

While it did not look as funded or advanced as the other ones, straight away I liked the result page of this tool. For my personally written text, it showed exact parts it found suspicious.

Human text AI detection result in ZeroGPT results
Human text AI detection result in ZeroGPT results

It also showed the best results for the mixed inputs:

Mixed text AI detection result in ZeroGPT results
Mixed text AI detection result in ZeroGPT results

It was able to identify that most of the text was human written, while parts were added with AI.

The only thing I found annoying about this service was that it was showing Google Ads, however it makes sense given it has to refinance itself. The bigger services can definitively learn something from ZeroGPT.

Stealthwriter.ai

Bypass AI Detection | Get 100% Human Score | Rewrite AI Text into Human Content
Our free tool bypasses AI detection, gets a perfect human score, and rewrites AI text into engaging human content. Create authentic and compelling content that stands out. Try it today!

Stealthwriter is a tool that allows you to detect AI generated content an also humanize the text to bypass other detection models. They have 2 models Ninja (default) and Ghost (paid version only).

The tool was able to identify AI content with good quality.

Stealthwriter human written text detection result
Stealthwriter Example Detection

After working with all those other tools, I found I was missing the opportunity to see the flagged parts of the text.

In terms of mixed content, it was on the soft side. Given the same amount of AI and human generated text favors marking it as human written.

Scribbr

Free AI Detector
Detect ChatGPT, GPT4, and Bard in seconds using Scribbr’s free AI Detector / ChatGPT Detector. Trusted by students, educators, and bloggers.

I am not sure if I tested the service at the wrong time, or the wrong way, or both. But all of my articles, including fully AI generated ones were marked as human written. I double checked this multiple times, and indeed the responses from API checker where 200, with 0.0002 probability of fake content.

Undetectable AI

The Truly Undetectable AI Content Writing Tool
Use our free AI Content Detector to check if your AI-written text will be flagged and humanize it with a click of a button to bypass AI detection tools.

The interface and results were good. I like that it used multiple services to check the AI content.

Undetectable AI Detection Example
Undetectable AI Detection Example

It also offers the option to humanize text which is interesting but not part of today's topic.

AI content was detected quite well, however it did not show which part of the text flagged it. For mixed content this is not ideal, as you may need to change only a paragraph or two.

Content at Scale

AI Detector Tool Checks ChatGPT, GPT-4, Bard, Claude & More
Free AI Detector checks if your text is from ChatGPT, GPT-4, Bard, & Claude (with 98% accuracy). Use our pro version to convert to undetectable ai text.

Content at Scale also provides a quite nice editor to write and edit the content. It flags that parts that it thinks are AI generated.

content at scale AI text detector
Content at scale AI text detector

At this point, I am starting to find it interesting that different tools find different parts of the same text as AI written. There is no consistency across these tools, besides the fact that most of them are able to recognize fully written AI articles.

Same as with other tools, when working with mixed inputs it is mostly confused. However, differently from the previously listed tools, it does not assign a probability but deems the article Reads like AI! even though there were only 2 sentences that were marked as AI generated (out of 12).

message from Content at Scale - Reads Like AI!
Reads Like AI!

Writer.com

Writer
Accelerate growth across every team with the most secure AI platform. We customize generative AI to your workflows, not the other way around.

I really like the interface of this tool, and the fact that it allows adding a url to analyse. However the results were not quite as good as the others. It does not seem to be able to detect semi-advanced AI content. All of my AI articles passed as human genereated content.

Writer.com AI detection result for an AI generated article
Writer.com AI detection result for an AI generated article

Sapling.ai

AI Detector | Free AI Content Detector | ChatGPT / GPT-4 / Claude | Sapling
AI detector for whether content -- such as a blog post or an essay -- is AI-generated or not. AI checker gives predictions for each part of the text as well as for the full text.

I like the multiple sections, breaking down into sentences and marking each separately. However the text that was marked as human in all other platforms, and which is 100% written by me, was marked as 68.7% fake:

Sapling AI Content Detector
Sapling AI Content Detector

Another thing that I find interesting is that if I add additional text before this snippet it goes to 0% and does not mark any of the suspicious sentences. The good thing is that it works well with larger texts.

I also have to note that Sapling did the best with mixed content. The same examples as above where marked as 45% fake, and highlighted the right sentences almost completely correct (besides a few false positives).

Crossplag

AI Content Detector
The AI Content Detector uses advanced machine learning algorithms to analyze and identify the content of a given text.

I had a bit of trouble creating an account (email did not want to come), but eventually got in to test the platform.

In terms of accuracy, it was spot on for AI generated articles and fully human written articles.

AI Generated Content Analysis by Crossplag.com
AI Generated Content Analysis by Crossplag.com

When working with mixed content, same as most of the other competitors, the tool favored human written parts and believes in good humanity. If you have some human written text sprinkled in you should be good.

OpenAI Classifier

New AI classifier for indicating AI-written text
We’re launching a classifier trained to distinguish between AI-written and human-written text.

OpenAI tool was not reliable enough (according to OpenAI themselves) so it was scrapped earlier this year. This is their official announcement:

As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. OpenAI is working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.

I feel important to include it because it was one of the more popular tools, and I am interested to see if it will return to marked in the future. If anyone, they have the capability to crack this.

So what did I learn from all this?

I was both impressed and disappointed at the same time.

The scientific findings hold to be true - it is not possible to reliably recognize AI text, especially if it has been paraphrased. Simple rephrasing using AI or manual modifications can fool the detectors quite easily. In addition, as AI generation tools keep getting better it might become pointless to even try to detect it.

At the same time, most of the tools were able indicate when AI content was used plainly without any modification. While it is not truly accurate or reliable, we can use these tools to help us improve the quality, and have a second opinion to help us make a decision about the source of the content.

There are fields where plagiarism and AI content is truly game changing. For example, educational system needs to find a way to cope with the fact that AI is the future, and there will be even more and better versions of it. Currently, using AI detection tools can only be used as a secondary measure.

As long as detection services are not accurate they will not be the answer. Think about it, false positives can be just as damaging in the educational environment (and others). If my paper is detected as AI written, even though it was not, it could have huge consequences.

Using AI content is not necessarily bad, nor does it break ethical norms when used responsibly. The bigger challenge I see is that some fields and industries might need an adjustment of values.

A text that can be generated with AI should not hold a big value in any field, and in that case, perhaps the field should start valuing something else, something that can not be generated. For instance, in writing essays or any other forms of writing - the essence should be about the uniqueness, the ideas, the personality and advancement of something.

AI content is raising the bar for all forms of content generation. It should no longer be valuable or accepted to be regurgitating already existing information in a form that is acceptable to everyone (example, essay or academic paper). It is not productive, nor valuable, nor inspiring.

The text that has value will be unique and inspiring.

AI regurgitating information to society. / Generated by DALL-E
AI regurgitating information to society. / Generated by DALL-E

My inspiration for the article

Large Language Models can be Guided to Evade AI-Generated Text Detection
Large Language Models (LLMs) have demonstrated exceptional performance in a variety of tasks, including essay writing and question answering. However, it is crucial to address the potential misuse of these models, which can lead to detrimental outcomes such as plagiarism and spamming. Recently, seve…
Creating an AI detector, I think i have it
The whole concept has been discredited. The only thing left in the spiral of uselessness and false detection is for some preinstalled software on your laptop to pop up “your subscription to AI detection has expired, enter your credit card to ensure continued protection”
How to detect AI-generated text and photos | Zapier
Learn about the challenges of AI content detection and explore tools like Hugging Face, OpenAI’s AI Text Classifier, and GPTZero for spotting AI content online.
New AI classifier for indicating AI-written text
We’re launching a classifier trained to distinguish between AI-written and human-written text.