Loading...
Loading...
If you've spent any time reading about AI detection, you've probably bumped into two words over and over: perplexity and burstiness. They show up in product descriptions, blog posts, YouTube explainers, and Reddit threads. Most of the time, they're presented as the magic keys to understanding whether something was written by a person or a machine.
They're not magic. They're statistical measurements, and they're useful ones, but the way they get talked about online has created more confusion than clarity. Some people think boosting perplexity is all you need to "beat" AI detectors. Others treat burstiness like a style trick you can just sprinkle on top of AI text. Both assumptions are wrong, and both can actually make your writing worse.
Let's walk through what these metrics actually measure, how detectors use them, and where the common advice goes off the rails.
In natural language processing, perplexity measures how predictable a sequence of text is from the perspective of a language model. The model reads your text one word at a time. At each step, it calculates how likely that particular word was, given all the words that came before it. If the word was highly likely, the model isn't surprised. Low perplexity. If the word was unexpected, the model is caught off guard. High perplexity.
Take two sentences. "For lunch today, I ate a bowl of soup." The model saw "bowl of" coming from a mile away. Low perplexity. Now try: "For lunch today, I ate a bowl of spiders." The model didn't predict "spiders" in that context. High perplexity.
AI-generated text tends to score low on perplexity because language models are literally designed to pick the most probable next word. That's their job. They're optimized for it. Humans, on the other hand, reach for unexpected metaphors, dialect-specific phrases, deliberate fragments, or words that carry a specific connotation only the surrounding context explains. This unpredictability isn't a flaw in human writing. It's how voice gets created.
GPTZero, one of the earliest AI detectors, set its original threshold at a perplexity score of 85. Text scoring below that was flagged as likely AI-generated. The idea was simple enough: if a language model finds your text very easy to predict, maybe a language model wrote it.
But here's where things get complicated. Not all low-perplexity text is AI-generated. Legal writing, technical documentation, and academic papers often score low on perplexity because they use precise, conventional language. A well-written chemistry lab report doesn't surprise a language model much. That doesn't mean a robot wrote it.
If perplexity looks at word-level predictability, burstiness zooms out to look at the rhythm of your writing across the whole document. It measures how much variation exists in sentence length and structural complexity.
People write in bursts. You might write a long, winding sentence with multiple clauses and parenthetical asides, then follow it with something short. Like this. Then maybe another medium-length one that picks up the thread again. The variation creates a kind of pulse. Researchers call this high burstiness.
AI models tend to produce text with low burstiness. Sentences cluster around a similar length, usually 15 to 25 words. The structure stays consistent: subject, verb, object, maybe a prepositional phrase. It reads like a metronome. Steady, smooth, and oddly lifeless.
The reason for this uniformity is straightforward. Language models are trained to produce coherent output. Coherence, as the models understand it, means consistency. They don't get tired, they don't get excited, they don't suddenly decide to make a point with a two-word sentence. Every sentence gets roughly the same amount of computational attention, and the result is writing that feels technically correct but rhythmically flat.
Most modern AI detectors don't rely on perplexity or burstiness alone. They use both, along with additional signals like vocabulary repetition patterns, transition word frequency, and neural classifier outputs.
The logic works like this. Low perplexity across many sentences suggests the word choices are too predictable to be human. Low burstiness suggests the sentence rhythm is too uniform. When both signals fire together, the detector's confidence goes up. When they conflict, say, low perplexity but high burstiness, the result is less certain.
Some detectors add a third layer: neural classifiers. These are machine learning models trained on labeled datasets of known AI and human text. They learn to recognize patterns that go beyond what simple statistical metrics can capture.
A lot of the advice you see online about "beating" AI detectors boils down to two tips: add unusual words to boost perplexity, and vary your sentence length to increase burstiness. This advice isn't technically wrong, but it's dangerously incomplete.
Swapping a few synonyms doesn't meaningfully change perplexity distributions. Detectors measure patterns across hundreds of word choices, not a handful of unusual words scattered through otherwise predictable text. Adding one short sentence doesn't produce authentic burstiness. Real burstiness is a pattern that emerges naturally from how a person thinks and writes, not something you can fake by inserting a five-word sentence every few paragraphs.
The worst version of this advice tells people to deliberately introduce typos or grammatical errors. Yes, human writing contains more errors than AI writing. But sprinkling in mistakes doesn't make text more human. It makes it worse. Detectors are getting better at distinguishing between natural errors and deliberately inserted ones, and readers can tell the difference too.
The only reliable way to change these signals is to genuinely improve the writing. When you restructure an argument, add specific examples from your own experience, vary your rhetorical approach across paragraphs, and write with a genuine voice, the perplexity and burstiness patterns shift naturally.
Perplexity and burstiness are useful diagnostic tools. They can tell you whether a piece of text exhibits statistical patterns commonly associated with AI output. That's genuinely valuable information.
What they can't do is tell you who wrote something. A low perplexity score doesn't prove AI authorship. It proves predictability. A low burstiness score doesn't prove a machine was involved. It proves uniformity. These might correlate with AI generation, but correlation isn't proof.
This distinction matters enormously in practice. A non-native English speaker who writes carefully and formally might produce text with low perplexity and low burstiness. A technical writer following a style guide might do the same. None of these people used AI. They just write in ways that happen to match the statistical profile that detectors associate with machine output.
Perplexity and burstiness are real, measurable signals that detectors use for a reason. They capture genuine differences between how language models produce text and how humans tend to write. Understanding them helps you understand what detectors are actually doing under the hood.
But treating them as the whole story is a mistake. They're two signals among many, they're measured imperfectly, and they correlate with all kinds of writing that has nothing to do with AI. The best detectors combine them with other methods, and the best approach to AI detection combines tool output with human judgment.
If you're a writer worried about being flagged, don't try to game the metrics. Write with genuine voice, specific detail, and authentic variation. The signals will take care of themselves. If you're using detectors to evaluate writing, remember that a score is a starting point for investigation, not a conclusion.
Humanize AI text to sound naturally human with EvalHub.
Start Free Trial