Comedian Sarah Silverman and two authors are suing Meta and ChatGPT-maker OpenAI, alleging the companies’ AI language models were trained on copyrighted materials from their books without their knowledge or consent.
The pair of lawsuits against OpenAI and Facebook-parent Meta were filed in a San Francisco federal court on Friday, and are both seeking class action status. Silverman, the author of “The Bedwetter,” is joined in filing the lawsuits by fellow authors Christopher Golden and Richard Kadrey.
A new crop of AI tools has gained tremendous attention in recent months for their ability to generate written work and images in response to user prompts. The large language models underpinning these tools are trained on vast troves of online data. But this practice has raised some concerns that these models may be sweeping up copyrighted works without permission – and that these works could ultimately be served to train tools that upend the livelihoods of creatives.
The complaint against OpenAI claims that “when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.” The authors “did not consent to the use of their copyrighted books as training material for ChatGPT,” according to the complaint.
The complaint against Meta similarly claims that the company used the authors’ copyrighted books to train LLaMA, the set of large language models released by Meta in February. The suit claims that much of the material used to train Meta’s language models “comes from copyrighted works—including books written by Plaintiffs—that were copied by Meta without consent, without credit, and without compensation.”
The suit against Meta also alleges that the company accessed the copyrighted books via an online “shadow library” website that includes a large quantity of copyrighted material.
Meta declined to comment on the lawsuit. OpenAI did not immediately respond to a request for comment.
The legal action from Silverman isn’t the first to focus on how large language models are trained. A separate lawsuit filed against OpenAI last month alleged the company misappropriated vast swaths of peoples’ personal data from the internet to train its AI tools. (OpenAI did not respond to a request for comment on the suit.)
In May, OpenAI CEO Sam Altman appeared to acknowledge more needed to be done to address concerns from creators about how AI systems use their works.
“We’re trying to work on new models where if an AI system is using your content, or if it’s using your style, you get paid for that,” he said at an event.