PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

A 'Pile' of Pirated Work: Authors Sue Anthropic AI, Joining YouTuber Backlash

The suit alleges the Claude creator trains its models on copyrighted books from a controversial dataset called 'The Pile,' which also came under fire last month from YouTube creators.

 & Emily Forlini Senior Reporter

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS
(Credit: SOPA Images / Contributeur / LightRocket via Getty Images)

Three authors filed a class-action lawsuit against Anthropic on Monday, alleging the AI company trains its models on pirated versions of copyrighted books found in a public dataset called The Pile.

"Anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books," the suit says. "Rather than obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them."

Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson join three other authors who sued Nvidia in March for training its AI models on book data in The Pile. YouTubers also spoke out against this dataset last month after an investigation found it contains transcripts from 173,000 videos taken without creators' consent.

Anthropic uses this data to improve its Claude chatbot, which the company claims performs better than OpenAI's flagship model. Anthropic is projecting $850 million in revenue in 2024, according to The Information, and has around $6 billion in funding from Google and Amazon. The suit claims this success comes from unpaid creative work and that Anthropic has "not even attempted" to pay the authors.

"It is no exaggeration to say that Anthropic’s model seeks to profit from strip-mining the human expression and ingenuity behind each one of those works," the suit says.

PCMag reached out to Anthropic for comment and will update this story if we hear back.

Shawn Presser, one of The Pile's creators, created the trove of text in 2020. He compiled the text of nearly 200,000 books from the notorious pirating site Bibliotik, and named the collection Books3. Although the original Books3 file was removed from The Pile in August 2023 due to copyright complaints, the suit alleges it remains available in other sources.

"It is apparent that Anthropic downloaded and reproduced copies of The Pile and Books3, knowing that these datasets were comprised of a trove of copyrighted content sourced from pirate websites like Bibiliotik," the suit says. The complaint quotes Anthropic discussing its use of data from The Pile on multiple occasions, as recently as July 2024.

In addition to hoovering up preexisting data from the web, AI companies are working to secure the rights to newly created content. OpenAI has struck licensing deals with major publishers and websites, such as The Wall Street Journal, Reddit, and, this week, Condé Nast. However, these deals rightly involve paying the publications for the work and the publisher's consent.

About Our Expert

Emily Forlini

Emily Forlini

Senior Reporter

My Experience

As a news and features writer at PCMag, I cover the biggest tech trends that shape the way we live and work. I specialize in on-the-ground reporting, uncovering stories from the people who are at the center of change—whether that’s the CEO of a high-valued startup or an everyday person taking on Big Tech. I also cover daily tech news and breaking stories, contextualizing them so you get the full picture.

I came to journalism from a previous career working in Big Tech on the West Coast. That experience gave me an up-close view of how software works and how business strategies shift over time. Now that I have my master's in journalism from Northwestern University, I couple my insider knowledge and reporting chops to help answer the big question: Where is this all going?

My Expertise

I'm the expert at PCMag for on-the-ground feature reporting and trending tech news, with a particular focus on electric vehicles and AI. I've published hundreds of articles and am also a podcast host, a bi-weekly tech correspondent for CBS News, a panel speaker and moderator, and a frequent contributor to a range of news and radio channels around the country.

The Technology I Use

All the latest from Apple and Microsoft, but I'll never give up my wired headphones! 

Read full bio