In a striking development, Meta CEO Mark Zuckerberg is accused of authorizing the use of pirated content to train the company’s Llama AI models, according to a lawsuit. The case, Kadrey v. Meta, alleges that Meta utilized a dataset known as LibGen, which contains unauthorized copies of books and articles, to develop its AI technologies.
This lawsuit adds to the growing list of legal challenges tech giants face over the use of copyrighted materials in AI training.
Allegations of copyright infringement
The plaintiffs, including notable authors such as Sarah Silverman and Ta-Nehisi Coates, argue that Meta intentionally used copyrighted works without permission. Newly unsealed documents suggest that Zuckerberg himself approved the controversial decision despite internal concerns. LibGen, a notorious platform for distributing pirated educational materials, has been embroiled in multiple lawsuits and ordered to shut down numerous times for copyright violations.
Internal communications at Meta, revealed in the filing, describe LibGen as a “dataset we know to be pirated.” Despite these acknowledgments, the approval reportedly came directly from Zuckerberg, raising questions about Meta’s internal policies and legal strategy. The lawsuit highlights a memo that mentions the decision to use LibGen was made after it was “escalated to MZ” (Mark Zuckerberg), signaling top-level involvement.
Concealment and torrenting accusations
The lawsuit further alleges that Meta attempted to conceal its use of the pirated dataset by removing copyright information from the content. A Meta engineer reportedly created a script to strip attribution details from e-books and scientific articles, a move the plaintiffs claim was intended to mask the infringement. Additionally, Meta is accused of torrenting LibGen, a process that not only downloaded but also redistributed the pirated content, exacerbating the infringement.
Meta’s head of generative AI, Ahmad Al-Dahle, allegedly downplayed the legal risks of torrenting, despite internal reservations. The plaintiffs argue that these actions amount to deliberate copyright violations, further undermining Meta’s claims of fair use.
Legal implications and public perception
While Meta has defended its actions under the fair use doctrine, which permits limited use of copyrighted materials for transformative purposes, the case raises serious questions about the company’s practices. The court has yet to decide on the case, and previous rulings have shown mixed outcomes for similar claims against AI developers.
Judge Vince Chhabria, overseeing the case, has already criticized Meta’s attempt to redact portions of the lawsuit, suggesting the company sought to avoid negative publicity rather than protect sensitive business information. This critique adds another layer of scrutiny to Meta’s handling of the situation.
As the case progresses, it highlights the ongoing legal and ethical challenges surrounding AI development and the use of copyrighted materials. The outcome could have significant implications for the tech industry’s approach to training AI models and respecting intellectual property rights.