The Em-Dash Myth: What Actually Gives Away AI Writing
The em-dash panic is mostly wrong. What actually moves an AI-detector score isn't punctuation.
The em-dash panic is mostly wrong. What actually moves an AI-detector score isn't punctuation.

A freshman composition professor pulled a student into office hours last semester to discuss an essay that “read like ChatGPT.” Her evidence was a page full of em-dashes. The student produced three years of Google Docs revision history showing he’d written every sentence himself. He happened to be a math major who loved Emily Dickinson, and Dickinson, as anyone who has read her knows, treated the em-dash like oxygen.
The meeting ended with an apology. It should not have started.
The em-dash has become the meme of AI detection, and it deserves the attention it gets about as much as avocado toast deserved the blame for the housing crisis. Punctuation is a weak signal. It’s also the one readers can see, which is why it dominates the discourse. What follows is what actually gives AI-generated writing away in 2026, and what doesn’t.
Large language models do use em-dashes more often than the average person. That much is true. A 2024 analysis from Andrew Gray at UCL showed that academic papers in ChatGPT’s sweet spot saw em-dash usage climb sharply after late 2022. So there’s something there.
But “more often than average” is not the same as “reliable signal.” A lot of human writers use em-dashes constantly. Journalists trained in AP style avoid them; long-form essayists and Substack natives lean on them hard. Students who read a lot of fiction tend to pick the habit up. Anyone who has ever written in a hurry reaches for an em-dash because it’s the punctuation mark that requires the least thinking.
Worse, the tell is easy to suppress. OpenAI told ChatGPT to use fewer em-dashes in GPT-5.1. You can also tell any model to avoid them with one line in the system prompt. A signal that can be killed by a sentence of instruction is not a signal worth building a case on.
If your detection theory fits on a bumper sticker, it’s wrong.
The detectors most schools use — Turnitin, GPTZero, Originality, Copyleaks — don’t care about punctuation. They care about the statistical shape of the text. Two numbers do most of the work:
Perplexity measures how surprised a language model is by the next word. Human writing is usually more surprising than LLM writing, because humans make odd word choices, misspell things, switch registers mid-sentence, and commit to weird metaphors.
Burstiness measures how much sentence length varies. Humans write short and long sentences next to each other. LLMs tend to settle into a metronomic middle range: 18 to 24 words, again and again.
You can see it for yourself. Paste three paragraphs of ChatGPT output into a word counter with a per-sentence view. The histogram will look like a textbook distribution. Paste three paragraphs of your own writing. It’ll look like a drunk EKG.
This is why em-dash removal doesn’t save a paragraph of AI output from detection. The dashes are cosmetic. The rhythm underneath them is the problem.
These patterns show up in AI-written text, and punctuation tweaks don’t fix them:
The “not just X, but Y” construction. “This isn’t just a detector, it’s a conversation about trust in the classroom.” Once you notice this shape, you cannot unsee it. It’s the sound of a model reaching for emphasis it hasn’t earned. Humans occasionally write sentences in this form. LLMs write one per paragraph.
Hedging preambles. “It’s important to note that,” “generally speaking,” “in many cases,” “from a broader perspective.” These clauses add nothing. They exist because models are tuned to sound careful. A human editor would cut them in the second pass; LLMs leave them in.
Tidy tricolons. Three-beat parallel lists with matched rhythm, like “fast, reliable, and affordable,” appear far more often in AI output than in natural prose. Humans who write professionally sometimes deploy them on purpose. Models use them as filler.
The boilerplate closer. “As AI continues to evolve, one thing is clear.” If a piece ends with “one thing is clear,” the piece was probably written by a model, or by a human copying the model’s habits. Real conclusions take a position and shut up.
Over-qualified claims. “This approach can be a useful tool for many writers in certain situations.” The more qualifiers a sentence has, the more likely it was written by something optimizing for plausible-deniability rather than meaning.
None of these is a smoking gun alone. Stacked together, they’re nearly always correct.
The Wikipedia guide to “Signs of AI writing” is the most complete catalog of these patterns anyone has published. It was written by the editors who spend their weekends reverting AI-generated vandalism, which means it’s unusually grounded. Anyone curious about the linguistic fingerprints of LLM output should read it start to finish.
Two caveats worth noting.
First, the Wikipedia list weights em-dashes more heavily than detection testing supports. The guide itself notes this is a “most useful in combination” signal, but readers often take it as gospel. It isn’t.
Second, the list underweights what could be called cadence uniformity: the tendency of model output to land on a 20-word sentence length and refuse to deviate. GPTZero’s burstiness metric catches this, but the human eye catches it too, once it knows what to look for.
Take any paragraph. Count the number of sentences in it. Now look at the first word of each sentence. If more than half of them start with “The,” “This,” “It,” or “In,” the paragraph was probably written or heavily assisted by an LLM.
Now count sentence lengths. If three or more in a row land between 17 and 23 words, same conclusion.
This won’t catch every case. It will catch most of them.
The em-dash discourse is a distraction that helps no one. If you’re a student whose professor is flagging your writing over punctuation, you’re being treated unfairly and you have grounds to push back. If you’re a writer using AI assistance, skipping em-dashes won’t save you from detection. Rewriting the rhythm underneath will.
Duey’s humanizer doesn’t target punctuation. It targets cadence, vocabulary distribution, and the structural patterns above, because those are what detectors actually measure. You can run a sample through it here and see the perplexity curve shift in the output.
A good rule: if you can’t tell whether a paragraph was written by a human or a model, the punctuation is probably not the answer. Read it aloud. Machines still lose that test.
Do em-dashes get papers flagged by Turnitin?
Not on their own. Turnitin’s AI detector weighs dozens of features, and punctuation is a small contributor. Stripping em-dashes from an AI-generated paper will change the score by a few points at most.
Is ChatGPT still using em-dashes in 2026?
Less than it used to. GPT-5.1 suppresses them in most contexts unless you ask for them. Claude and Gemini still lean on them. The trend is downward across all the major models.
What’s the single biggest AI tell in 2026?
Cadence uniformity. That’s 18-to-24 word sentences one after another, paragraph after paragraph. It’s what burstiness measures, and it’s the pattern that survives the longest across prompt rewrites.
Can I write like Emily Dickinson without being accused of being a bot?
Yes, but maintain your version history. A Google Doc with three weeks of revisions is the best defense against a false positive, regardless of how you punctuate.
Need to ship AI-assisted writing that reads like you wrote it? Try Duey’s humanizer. It rewrites the rhythm underneath, which is where detectors actually look.