"Imagine if someone photocopied every book in the public library, burned the library down, and then opened a subscription service for the copies. That's the metered intelligence business model."
Sorry but this is retarded. That is not the business model at all.
The AI companies aren't selling the data they used to train their models. Training their models didn't require the destruction of the data. For the most part those data are just as available as they always were. More or less anything you want to read is available for free, if you know where to look. There are 64 million books on Anna's archive, along with 95 million scientific papers.
The problem is that you are not going to read 65 million books and 95 million scientific papers. An especially dedicated reader might read one book a week, on average. A real speedreader might get through a book a day. At that rate the speedreader would take about 200,000 years to read Anna's Archive. Which would be a painful experience, because 99.99% of books are shit.
It hasn't been possible for humans to read everything that's ever been written for several centuries. There's just too much of it. And the problem is just getting worse. There's more stuff to read every year.
Unlike a human, an LLM actually can read everything that's been written, in the sense that the corpus can be compressed into a single set of model weights. Better yet, that model can then be interrogated using natural language. It's a library containing all human knowledge, which you can talk to like a person, and which will formulate its answers to your queries based upon its entire knowledge base. The sheer scope of knowledge that's been produced already makes this technology invaluable, probably even essential. The business model isn't access to data, it's access to an interrogable model. To use the utility metaphor, they aren't claiming to have invented water; they're claiming to have built an aqueduct.
OK, so, should the AI companies have paid copyright-holders for the data?
This is the same copyright-trolling that's been used to artificially hobble the Internet since Napster was dismantled by the music labels for undermining their distribution and curation oligopoly. Remember Google Books? That was supposed to be a free Library of Alexandria, until the publishing houses shut it down because making the long tail of their out-of-print back catalogues freely available was going to cost them money somehow; result, Google Books is useless. Then there's the scientific publishing industry putting every single peer-reviewed paper behind a $30 paywall, despite not having paid for either the research or the peer-reviewing. Insofar as the data aren't easily available, it isn't the fault of the AI companies. It's due to artificial restrictions demanded by copyright-holders. The OP's 'burning down the library to open a subscription service' characterization isn't what the AI companies are doing, it's what the copyright corporations have already been doing for decades.
AI companies aren't selling or providing copyrighted works. Their models are generally prevented from reproducing published works at all, precisely to prevent them from being used for wholesale copyright circumvention (which is retarded, because you can still find those works for free, but anyhow). Tracking down every single rights-holder for every single byte would be a Herculean task, and for a lot of it (e.g. Reddit posts) there's no one to pay (they already got paid in Reddit gold). Are the companies supposed to obtain positive consent from every individual redditor who ever got an updoot for every single post they scraped? Posts that were made on the open Internet, for anyone to read? When all they are essentially doing is performing a mathematical operation to compress that data into an unindexed model?
That would simply be impossible, the same way it was impossible to legally fill a 100 GB iPod using $1 mp3s purchased from the iTunes music store (unless you were willing to spend $30,000 on mp3s, which no one was, and which Apple understood quite well). It would stop the development of this technology dead in its tracks.
Which is the real point here.
Let me trace the timeline here because nobody's connecting it.
Step 1: Scrape the entire internet. Every book, every article, every conversation, every piece of art, every forum post. Do it without asking. Do it without paying.
Step 2: Train a model on all of it. Call it "artificial intelligence."
Step 3: Go to BlackRock's Infrastructure Summit and announce: "We see a future where intelligence is a utility, like electricity or water, and people buy it from us on a meter."
Step 3 is where you sell people's own knowledge back to them. On a meter.
They took the collective output of human thought, compressed it into a model, and now they want to charge you by the token to access a version of what you and everyone you know already created.
One Reddit user put it perfectly: "They stole all this data from us, the people, our life's work, creativity, art, by devouring the internet and blowing through all copyright laws. Now they want to sell it back to us in the form of a utility."
Imagine if someone photocopied every book in the public library, burned the library down, and then opened a subscription service for the copies.
That's the metered intelligence business model.
And they're pitching it to infrastructure investors as though they invented water.