Legal documents claim Meta CEO Mark Zuckerberg authorised using unlicensed e-books and articles to train the company's Llama AI models, raising significant questions about intellectual property practices in AI development.

The unredacted court documents, filed in the U.S. District Court for the Northern District of California, detail Meta's internal decisions regarding AI training data acquisition and management. The revelations emerge from the Kadrey v. Meta case, which includes plaintiffs Sarah Silverman and Ta-Nehisi Coates.

According to court filings, Meta's leadership explicitly approved using LibGen, a dataset described internally as "pirated," for training their Llama models. The decision came despite concerns from Meta's AI executive team about potential legal and regulatory implications. Internal communications show that after "escalation to MZ," Meta's AI team received approval to use LibGen, a platform previously subject to copyright infringement lawsuits and millions in fines.

Meta's approach to data acquisition included torrenting LibGen's content, a decision that raised concerns among research engineers due to the requirement to upload files while downloading simultaneously.

The decision to use unlicensed content apparently stemmed from time-to-market pressures. According to previous reporting from The New York Times, Meta's executives determined that negotiating proper licenses would take too long and decided to rely on fair use as a legal defence.

This court case currently only applies to Meta's earliest Llama models, not recent releases. The fair use defense may still prove successful, as similar AI-related copyright claims against Meta were dismissed in 2023.



Share this post
The link has been copied!