Author Sues Meta for Copyright Infringement in AI Training

Novelist Christopher Farnsworth filed a class-action lawsuit against Meta Platforms on Tuesday, accusing the tech giant of using pirated copies of his books to train its Llama artificial intelligence model without authorization.

The lawsuit, filed in the U.S. District Court for the Northern District of California, alleges that Meta infringed on the copyrights of Farnsworth and potentially thousands of other authors by reproducing their works without permission to develop its large language models (LLMs).

Farnsworth, a Los Angeles-based author of eight novels and three novellas, claims Meta used a dataset called "Books3" containing nearly 200,000 pirated books, including his own works like "Blood Oath" and "The President's Vampire," to train its Llama AI models.

"Meta stole hundreds of thousands of pirated copyrighted books to build a commercial product called a Large Language Model," the complaint states. "Meta ignored this basic principle, and the federal law embodying it, by stealing and exploiting copyright-protected books for profit."

The lawsuit seeks unspecified monetary damages and an injunction to halt Meta's alleged infringement. It also requests class-action status to represent other affected authors.

Meta has previously acknowledged using the Books3 dataset, part of a larger collection called "The Pile," in training its Llama 1 model. In a February 2023 research paper, Meta researchers cited Books3 as one of two book corpora used, describing it as "a publicly available dataset for training large language models."

However, the creators of The Pile had openly admitted that Books3 contained copyrighted material. The lawsuit alleges Meta was aware of the dataset's pirated nature but used it anyway.

This case joins a growing number of legal challenges against tech companies over the use of copyrighted materials in AI training. Other authors, including Ta-Nehisi Coates and Sarah Silverman, have filed similar lawsuits against Meta.

The dispute highlights the tension between rapid AI development and copyright protection. AI companies argue their use of copyrighted material for training falls under fair use, while creators contend it threatens their livelihoods.

Sign up for AI-360