Piracy: AI company sued for using material from pirated books to train its generative programs

October 3, 2025 • 10:30 am

Well, here’s one of the downsides of AI: a company taking illegal actions to train their AI offerings. If you’ve written a book that was used to train Anthropic’s AI program, and you can find out if this is true was by going to the website below (click on the headline), you stand to gain up to $3,000 per book, providing that your book was copyrighted in the U.S.

Apparently Anthropic downloaded many books from pirated sites, knowing that that was illegal, and then used it to train their AI program. See the NYT article at bottom for details.

So yesterday a friend sent me this notice, which I had not gotten and had not heard about (click to read). I want to make authors aware of this as a way of stemming this piracy.

Here is an except from the settlement website:

What is the Settlement About?

This Settlement resolves a class action lawsuit brought against Anthropic over the company’s use of allegedly pirated books to train its AI model.

The plaintiffs claim that Anthropic infringed protected copyrights by downloading books from Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi). Anthropic denies these claims. The Court didn’t decide who was right. Instead, both sides agreed to settle to avoid more litigation.

What is the current status of the Settlement?

The Settlement Administrator is notifying people about the Settlement. Class Members can search for their books on the Works List and file a claim.

On September 25, 2025, the Court granted initial approval of the Settlement. Next, the courts will hold a fairness hearing, resolve any appeals, and make a final decision.

What benefits does the Settlement Provide?

If approved, the Settlement provides a cash payment to Class Members who file a valid and timely claim. The Settlement Fund includes approximately $3,000 per work, before deducting costs, fees, and expenses, as described below.

The Settlement also requires Anthropic to destroy all books that it downloaded from the LibGen or PiLiMi datasets and any copies of those books, subject to Anthropic’s existing legal preservation obligation or obligation pursuant to court order under either U.S. or international law, and then provide written confirmation.

What fees and expenses will be paid from the Settlement Fund?

Under the Settlement, Anthropic has agreed to establish $1.5 billion Settlement Fund. The Settlement Fund will be divided evenly based on the number of works for which valid claims are submitted.

The Settlement Fund will also be used to pay for notice and administrative costs related to the Settlement, attorneys’ fees and expenses, and any service awards for the Class

Of course I checked to see if this was kosher, and many sources verified it. Here’s an article in the NYT. Click to read, or find the article archived here.

An excerpt from the NYT:

In a landmark settlement, Anthropic, a leading artificial intelligence company, has agreed to pay $1.5 billion to a group of authors and publishers after a judge ruled it had illegally downloaded and stored millions of copyrighted books.

The settlement is the largest payout in the history of U.S. copyright cases. Anthropic will pay $3,000 per work to 500,000 authors.

The agreement is a turning point in a continuing battle between A.I. companies and copyright holders that spans more than 40 lawsuits across the country. Experts say the agreement could pave the way for more tech companies to pay rights holders through court decisions and settlements or through licensing fees.

“This is massive,” said Chad Hummel, a trial lawyer with the law firm McKool Smith, who is not involved in the case. “This will cause generative A.I. companies to sit up and take notice.”

The agreement is reminiscent of the early 2000s, when courts ruled that file-sharing services like Napster and Grokster infringed on rights holders by allowing copyrighted songs, movies and other material to be shared free on the internet.

“This is the A.I. industry’s Napster moment,” said Cecilia Ziniti, an intellectual-property lawyer who is now chief executive of the artificial intelligence start-up GC AI.

The settlement came after a ruling in June by Judge William Alsup of the U.S. District Court for the Northern District of California. In a summary judgment, the judge sided with Anthropic, maker of the online chatbot Claude, in significant ways. Most notably, he ruled that when Anthropic acquired copyrighted books legally, the law allowed the company to train A.I. technologies using the books because this transformed them into something new.

. . . . Anthropic had illegally acquired millions of books through online libraries like Library Genesis and Pirate Library Mirror that many tech companies have used to supplement the huge amounts of digital text needed to train A.I. technologies. When Anthropic downloaded these libraries, the judge ruled, its executives knew they contained pirated books.

Anthropic could have purchased the books from many sellers, the judge said, but instead preferred to “steal” them to avoid what the company’s chief executive, Dario Amodei, called “legal/practice/business slog” in court documents. Companies and individuals who willfully infringe on copyright can face significantly higher damages — up to $150,000 per work — than those who are not aware they are breaking the law.

. . .After the judge ruled the authors had cause to take Anthropic to trial over the pirated books, the two sides decided to settle.

“This settlement sends a powerful message to A.I. companies and creators alike that taking copyrighted works from these pirate websites is wrong,” said Justin A. Nelson, a lawyer for the authors who brought the lawsuit against Anthropic.

As part of the settlement, Anthropic said it did not use any pirated works to build A.I. technologies that were publicly released. The settlement also gives any others the right to still sue Anthropic if they believe that the company’s technologies are reproducing their works without proper approval. Anthropic also agreed to delete the pirated works it downloaded and stored.

. . . Even if courts find that training A.I. systems with copyrighted material is fair use, many A.I. companies could be forced to pay rights holders over pirated works because online libraries like Library Genesis and Pirate Library Mirror are widely used, Mr. Hummel said.

It’s dead easy to check by putting in your name or the name of your book(s) at the first site above, and here’s part of what it spat out when I gave it my name.

So I filed a claim for each book. Although Anthropic claims that its piracy is “fair use”, that principle usually applies to using only small bits of works, not entire books—books acquired illegally—to help a company make a profit. Their lawyers should have told them to just buy the damn books!

29 thoughts on “Piracy: AI company sued for using material from pirated books to train its generative programs”

DrBrydon says:

October 3, 2025 at 10:44 am

This is a big deal. Large Language Models, to be affordable, have to be trained with publicly available information. Not all publicly available info is free-to-use, though, especially in for-profit scenarios. Having to take this into consideration could impact A.I. developers, and thus, customers costs or quality.
Doug says:

October 3, 2025 at 10:47 am

Congratulations, Jerry. We will be enjoying a check in our house, also. Of course, we don’t have your royalties!
retroformat says:

October 3, 2025 at 10:57 am

Each author must face some interesting questions, when asked by an A.I. developer, “May we use your book, and what would you charge?” The first question might be, do I want to support the development of AI, through the use of my book(s), which ultimately robs humanity of opportunities to value individual creativity? (That’s not to disparage the huge positives that A.I. will also bring). Another question might be, should I let the A.I. developer use my works, because my work (of course!) is excellent, and I would hate to have A.I. become a less excellent resource? (… then again, maybe you do want it to be a less excellent resource, because you want A.I. to fail, and would withhold your book for that reason). It appears Jerry is happy to collect on a settlement, rather than opt-out in order to be able to fight the A.I. developers… Jerry: any thoughts on these questions?
1. whyevolutionistrue says:
  
  October 3, 2025 at 12:04 pm
  
  I’m pondering them now as I just found out about the issue this morning. But I have no qualms about the AI companies being forced to pay for breaching copyright. For years my publishers and agents have gone after people using my work illegally. Intellectual property is intellectual property, and I should be the one to decide how my work is used, and whether there is a charge. In fact, my own feelings about AI are mixed, and I’d have to see how it’s used before I’d decide whether it could use my work. But if a profit-making corporation uses it, I’d be more inclined to ask for a fee than if it were used for nonprofit purposes with educational benefits. I get virtually nothing, for example, for allowing WEIT to be translated into Arabic and Farsi, as it’s more important to me to promote acceptance of evolution in Muslim countries.
  1. Michael Cole says:
    
    October 3, 2025 at 12:33 pm
    
    On the other hand, you could take the view that it is better to make it easy for AI to learn about evolution from you than from someone else.
    1. Barbara KNox (CoC) says:
      
      October 4, 2025 at 4:02 pm
      
      Too bad LLM AI™ doesn’t learn obout anything, excepting only how to produce plausible text following on from previous text. ELIZA on steroids.
      https://link.springer.com/article/10.1007/s10676-024-09775-5
      
      While I’m on that hobbyhorse (and, ahem, also producing text following on from previous text 🙂), LLM investors and developers are increasingly concerned about the plateau in the scaling curve of LLM performance v computing resources. (Ask an LLM for the details, it won’t mind since it doesn’t have one). Diminishing returns. AFAIK there is not yet a prediction market for when the LLM bubble’s Minsky Moment (q.v.) will occur, but stay tuned.
      
      Conflict of Interest Declaration: My postgrad education half a century ago was in what is now called GOFAI (Good Old Fashioned AI). Today’s brute-force statistical curve-fitting punks should just get off my #*&%^@!! lawn. Good riddance.
  2. Leslie MacMillan says:
    
    October 3, 2025 at 8:43 pm
    
    Good for you. You put intellectual capital into those books and you have a legal right to damages for infringement. I’m glad you’re making a claim. Someone who wants to learn about evolution and the case against religion from you, they can pay for it, unless you decide to let them learn for free.
    1. Barbara KNox (CoC) says:
      
      October 4, 2025 at 4:13 pm
      
      Or they can always join us here 🙂.
Coel says:

October 3, 2025 at 11:26 am

Apparently Anthropic downloaded many books from pirated sites, knowing that that was illegal, and then used it to train their AI program.

To emphasize for clarity (since this will be important), its not the training an AI on a copyrighted book that was deemed illegal, it was the downloading of copyrighted books from a pirate archive that was illegal (literally, making a copy of a copyrighted book).

This does not stop AI companies training their AI on copyrighted material (without having to pay to do that) so long as they access that material in a legal manner (which might involve payment).
1. whyevolutionistrue says:
  
  October 3, 2025 at 12:04 pm
  
  Yes, I thought I made that clear when I said they should have just BOUGHT the books!
Jared Miller says:

October 3, 2025 at 11:38 am

Many thanks for the tip on that. I found one of my books in the list, too. Will be interested to see if I get a check out of it.
Graham Wallis says:

October 3, 2025 at 11:40 am

I put in Richard Dawkins and got 26 hits!
1. whyevolutionistrue says:
  
  October 3, 2025 at 12:05 pm
  
  I’ve written to some of my friends who have published books about this. I suspect Richard knows, but he was included in my heads-up.
2. Mike Hart says:
  
  October 3, 2025 at 12:09 pm
  
  I put in “J. K. Rowling” and got 21 hits. So by definition Claude Opus is transphobic. Don’t blame me, I don’t make up the rules.
  1. whyevolutionistrue says:
    
    October 3, 2025 at 3:34 pm
    
    To be part of the settlement, the pirated books have to be copyrighted in the US, but I suspect all the Rowlings books are.
    1. Barbara KNox (CoC) says:
      
      October 4, 2025 at 4:18 pm
      
      And I expect there’s a Bern Convention angle for overseas authors in general. Copyrights, unlike patents or trademarks, do not have to be officially registered anywhere. “Copyright subsists.” (IANAL, especially an iP one.)
Michael Sternberg says:

October 3, 2025 at 11:51 am

“A fine is a fee”, and even large fees are but little hurdles for AI companies that still swim in speculative investment money.
Robert Elessar says:

October 3, 2025 at 11:52 am

Well, they apparently didn’t use any of my books. I should feel relieved, but somehow I’m vaguely insulted.
1. Danny says:
  
  October 3, 2025 at 2:14 pm
  
  Haha same here!
2. Jim Batterson says:
  
  October 3, 2025 at 4:00 pm
  
  I as well! But in somewhat of a sibling rivalry, one of my baby brother’s books, “Pursuit of Genius” on the founding of Institute for Advanced Study was listed. Though rather than sue, he might be amused and flattered that such an esoteric volume was used and perhaps even pirated.
Emily Basham says:

October 3, 2025 at 12:42 pm

There are on-going lawsuits brought by visual artists and musicians as well. I hope the artists win. It sounds like theft.
1. Frau Katze says:
  
  October 3, 2025 at 10:08 pm
  
  Also by the Times
  
  https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
Jon Gallant (species assigned at birth---H. sapiens) says:

October 3, 2025 at 2:13 pm

Congratulations, PCC(e). I hope you squeeze them dry. Alas, my own contributions to the book world—“Gringolandia: A Guide for Puzzled Mexicans”, and an ill-fated cell biology textbook of historically low sales—were somehow overlooked by Anthropic. But both books are still listed on AbeBooks, and the former even on Amazon. [The textbook achieved “negative sales” in Canada, perhaps a record of sorts.]
1. whyevolutionistrue says:
  
  October 3, 2025 at 3:34 pm
  
  I will get what they give me; I do not levy the settlements.
  1. john avise says:
    
    October 3, 2025 at 9:56 pm
    
    Many thanks for the heads-up on this settlement, Jerry. The list includes more than a dozen of my pirated books! Am I going to get rich? Somehow I doubt it, but I am flattered, I guess.
Marlene Zuk says:

October 3, 2025 at 2:32 pm

I heard about this a while back from my agent, who’s been great about keeping people up to date. Since it sounds like a few other readers might be interested, here’s what she wrote a day or so ago:

The database to check whether your book is included in the Anthropic Copyright Settlement: https://secure.anthropiccopyrightsettlement.com/lookup

The settlement class was formed based on books that:

Were included in the versions of the LibGen or PiLiMi datasets downloaded by Anthropic;
Have an ISBN or ASIN;
Were registered with the U.S. Copyright Office within five years of publication; and
Were registered before being downloaded by Anthropic, or within three months of publication.
If your book appears in this database, it is included in the settlement. If you do not find your book by searching your name, title, or ISBN (and I suggest you do all three, as the metadata used to put this database together may have some errors) it likely did not meet these four requirements and is not included in the settlement.

Important note: Books that were included in the dataset but not registered for copyright are excluded. If your publisher was contractually obligated to register the copyright and did not (you can check registration here: https://publicrecords.copyright.gov) , please make and keep a list of those titles. The process for addressing this is still in flux and will vary by publisher. The Authors Guild and AALA are actively working on this.

The key deadlines are:

Opt-out: January 7, 2026
File a claim: March 23, 2026

She suggests holding off on filing a claim until things are a bit clearer.
Matthew Morycinski says:

October 4, 2025 at 1:23 am

Have a look at David Gerard’s https://pivot-to-ai.com blog. His tack is that AI as currently implemented using LLMs has many flaws. One of them is the ease with which anyone can extract raw training data by using a correct prompt. So using pirated copies of books is not the only issue, because the LLM itself becomes a potential source of copies of the books it was trained on, regardless of how they were obtained.

There are many more issues which overall make it very irresponsible to mass-deploy LLMs in situations where quality, privacy, security, copyright protection etc. are important. There is no room here for more, but Gerard’s blog provides new examples daily, many of them hilarious.
Rick Bannister says:

October 4, 2025 at 8:41 am

You may find John Scalzi’s comments on his experience with Anthropic interesting.
https://whatever.scalzi.com/2025/10/02/authors-time-to-get-that-bag/

He had 17 of his works pirated.
As for the whole issue of how to create true AI, it occurs to me that the idea of feeding immense quantities of data has an inherent weakness. Unless there is some means of ranking the worth of each item you could just be creating a Garbage In Garbage Out problem.
Matt Young says:

October 4, 2025 at 1:46 pm

I have 3 or more books on the list, depending on how you count multiple editions, but I own the copyright to only one. Will the publishers get the money for the others?

Comments are closed.

	--Malcolm England on Sunday: Hili dialogue
	Ephraim Heller on Readers’ wildlife photos
	Barbara KNox (CoC) on Sunday: Hili dialogue
	RPGNo1 on NYRB article attacks the biolo…
	Phillip Helbig on NYRB article attacks the biolo…

Share this:

29 thoughts on “Piracy: AI company sued for using material from pirated books to train its generative programs”