Bestselling novelist David Baldacci on how AI companies deliberately stole every book and academic paper published in the last 70 years:
Baldacci is a named plaintiff in a lawsuit against OpenAI and Microsoft in the Southern District of New York, alongside John Grisham, Scott Turow, George R.R. Martin, Jodi Picoult, and Jonathan Franzen.
They're also representing roughly 60,000 unnamed plaintiffs.
He explains how AI companies arrived at novels as the key ingredient for building superintelligence:
"The AI community searched the world. How do you create superintelligence? They tried everything to try to figure out, how do you do this? They fed dictionaries into it. They did lots of stuff. They finally found the only way to create super intelligence that they needed was to feed novels into the large language models. Novels worked, finished products of storytelling with characters and dialogue and research and events and interactions. That was their Holy Grail moment."
Baldacci points out the obvious path the AI companies could have taken —negotiating with the five major publishers, each of whom represents around 100,000 writers.
Instead, they chose theft.
@davidbaldacci continues:
"They decided we're just going to steal them. I'm not saying anything out of school. They've admitted this. They got most of the books from a Russian pirate website where they would go and download the books from there. And they didn't even want their software programs to know they were stealing the books. So they had the software program that would scrape off the copyright page, scrape off the ISBN number on the back, and just download the book itself."
The scale is staggering.
Over the last eight or nine years, every book and every academic paper published in the last 70 years worldwide has been ingested into the large language models at Anthropic, OpenAI, and Meta.
Baldacci testified about this on Capitol Hill in July.
He describes the personal toll of being a named plaintiff:
"I've had to give them all of my materials, all of my financial information I've had to give them, let them come in and do a complete scrape on all of my emails, all of my communications. I sat through a nine hour deposition like I've done something wrong. They said, yeah, we've taken your books, we haven't paid you a dime and we didn't ask your permission, but we should be entitled to do it because AI is so cool. That's basically their legal argument."
The parallel case against Anthropic in California has already settled for $1.5 billion, to be paid out over two years to 50,000 writers. The OpenAI and Microsoft case is now past discovery and heading toward a settlement conference.