PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Major Scientific Repository arXiv Cracks Down on AI-Generated Papers

If the platform finds 'incontrovertible evidence' that someone didn't check the output of a large language model, they could face a one-year ban.

 & Will McCurdy Contributor

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS
(Credit: elenabsl/Shutterstock)

ArXiv.org, one of the world’s most popular repositories of free scientific research, is cracking down on AI-generated content and those who fail to copyedit for hallucinations before submission.

"[If] a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper,” says Thomas G. Dietterich, current chair of the Computer Science Section of arXiv. "The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue."

Dietterich, a professor at Oregon State University, says examples could include “hallucinated references,” as well as what he called “meta-comments from the LLM.” For example, this could be a section that reads: “Here is a 200-word summary; would you like me to make any changes?”

“Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated,” he adds.

He later clarified that appeals will be possible if bans are issued, telling 404 Media that "Our internal process requires first a moderator to document the problem and then for the Section Chair to confirm before imposing the penalty.”

On X, Steinn Sigurðsson, an astrophysics professor at Penn State and scientific director at arXiv, noted that, "you don't see the stuff we reject... some of it is really, really egregious," adding that "the decision to impose additional consequences is largely to throttle that stuff so n00bs and bad actors don't trash us trying repeatedly."

AI-generated content isn’t just a big problem in your social media newsfeed; it’s now a major issue in the world of academia. In the run-up to one of the world’s most popular AI conferences, the 2026 International Conference on Learning Representations (ICLR), 21% of ICLR peer reviews were allegedly fully AI-generated, and more than half showed signs of AI use. The issue was less extreme for the papers themselves, but still very serious. Approximately 1%, or 199 manuscripts, were fully AI-generated, while 9% contained more than 50% AI-generated text.

Reactions on social media were broadly positive. Ethan Mollick, a Wharton professor studying AI, tweeted that the policy seems "incredibly reasonable...at least in the short term."

Ash Jogalekar, a senior program manager of agentic AI for science at Microsoft, agreed. "Expecting high standards from scientists and telling them that they can use AI tools but need to check and recheck the results before publishing is not just reasonable but is the way good science should always be done," he tweeted.

Lucas Beyer, a former OpenAI researcher now at Meta, praised it as “very good” and called for the restrictions to be "strongly enforced."

Enforcing these measures could be a big lift, as ArXiv.org handles a large volume of content. According to its own statistics, it reached 2 million submissions by the end of 2021, with roughly 24,000 articles being submitted monthly as of November 2024.

About Our Expert

Will McCurdy

Will McCurdy

Contributor

I’m a reporter covering weekend news. Before joining PCMag in 2024, I picked up bylines in BBC News, The Guardian, The Times of London, The Daily Beast, Vice, Slate, Fast Company, The Evening Standard, The i, TechRadar, and Decrypt Media.

I’ve been a PC gamer since you had to install games from multiple CD-ROMs by hand. As a reporter, I’m passionate about the intersection of tech and human lives. I’ve covered everything from crypto scandals to the art world, as well as conspiracy theories, UK politics, and Russia and foreign affairs.

Read full bio