Text Created By Generative AI Tools Use A Lot Of Stylistic 'Excess Words', A Study Finds

Robot typing

When AIs become better and better, it's getting increasingly difficult to tell apart a work made by a computer and a work made by a human.

Since OpenAI introduced ChatGPT and quickly wowed the world, people began to realize how this technology can be used to help them in various things. And among the many things generative AI tools can do to help, is writing things, including research papers, and abstracts.

With the rapid increase in the use of large language models (LLMs), such as ChatGPT, in academic writing, people may wonder, which one is written by whom.

Traditionally, attempts to differentiate LLM-generated text and human-written work in academic literature have relied on several methods.

One common approach involves using LLM detectors, which are trained to distinguish between human and AI-generated text based on known samples. Another method analyzes word frequency distributions in texts, treating them as mixtures of human and AI-generated content. Another method is to employ lists of marker words overused by LLMs, typically stylistic terms rather than content-specific vocabulary.

Delving into ChatGPT usage in academic writing through excess vocabulary
Frequencies of PubMed abstracts containing certain words. Black lines show counterfactual extrapolations from 2021–22 to 2023–24. The first six words are affected by ChatGPT; the last three relate to major events that influenced scientific writing and are shown for comparison.

With the widespread availability of LLMs, the trend has led to concerns about the authenticity and originality of scientific texts, with implications for research integrity and the evaluation of academic contributions.

As LLM-powered generative AIs have become increasingly good in doing what they do, a novel approach is needed.

Here, researchers proposed a method, which uses data-driven strategy that avoids some limitations found in previous methods.

Instead of relying on predefined datasets of human and LLM-generated texts, the method examines excess word usage to identify LLM involvement.

Inspired by studies of excess mortality during the COVID-19 pandemic, this technique tracks the frequency of certain words that show a significant increase post-ChatGPT release compared to their expected usage based on trends from earlier years. This method allows for a more unbiased and comprehensive analysis of LLM’s impact on scientific writing.

To do this, in a prepublication paper, the researchers analyzed over 14 million PubMed abstracts from 2010 to 2024.

They created a matrix of word occurrences across these abstracts and calculated the annual frequency of each word.

By comparing the observed frequencies in 2023 and 2024 to counterfactual projections based on trends from 2021 and 2022, they identified words with significant increases in usage.

These words, which they termed "excess words," were then used to gauge the influence of LLMs.

On their research, they found that LLM-powered generative AI tools use a lot of stylistic words, like "delves," "showcasing," and "underscores."

The researchers found a significant increase in these excess words in 2024, coinciding with the widespread availability of ChatGPT, suggesting AI involvement in these works.

Delving into ChatGPT usage in academic writing through excess vocabulary
(a) Number of excess words per year, decomposed into the excess content words and excess style words. (b) Number of excess words per year, decomposed into nouns,verbs, adverbs, and adjectives.

To come to their conclusion, the researchers quantified this excess usage with two measures: the excess frequency gap (the difference between observed and expected frequencies), and the excess frequency ratio (the ratio of observed to expected frequencies).

To estimate the extent of LLM usage, the researchers used the frequency gap of excess words as a lower bound. For example, the word “potential” showed an excess frequency gap, indicating that at least 4% of 2024 abstracts included this word due to LLM influence. By analyzing abstracts containing words with excess usage, the authors obtained a lower bound of 10% for LLM-assisted papers in 2024.

This approach provided a robust lower bound, acknowledging that the actual figure could be higher due to some LLM-processed abstracts not containing any tracked excess words.

Delving into ChatGPT usage in academic writing through excess vocabulary
(a) Observed frequency and counterfactual expected frequency in 2024 of abstracts containing at least one of the excess style words. (b) The frequency gap.

The estimates varied across different fields, such as 20% in computational studies and 6% in prestigious journals like Nature, Science, and Cell. They also differed by country, with 16% in China compared to 3% in the UK, and by journal, with 24% in Sensors and 17% in Frontiers/MDPI. The highest estimate, 35%, was for computational papers from China.

This research reveals a major change in academic writing styles influenced by the emergence of LLMs like ChatGPT.

By creating a new method to identify excessive word usage, the study provides strong evidence that LLMs have significantly impacted scientific literature, with at least 10% of recent biomedical abstracts showing signs of AI assistance.

This highlights the transformative effect of LLMs on academic communication and raises important questions about research integrity and the future of scholarly writing.