⚰️ The open web is dying. Not slowly
NYT in court. Cloudflare blocking AI crawlers by default. Reddit paywalled to everyone except Google and OpenAI.
Cloudflare's CEO said bot traffic will exceed human traffic by 2027. If your AI strategy depends on scraping the public internet, you have 18 months before your pipeline breaks.
Three strategies survive:
1/ Licensing deals. $200M checks to News Corp and Reddit. Only frontier labs can play.
2/ Synthetic data. Works for math and code. Collapses on everything else.
3/ Consented crowdsourced collection. Real people, real conditions, paid directly, auditable
The first two have hard ceilings. The third compounds.
Every contributor expands the languages, locations, and modalities you can serve. It works the way search worked in 2001 and mobile in 2009.
This is the bet at
@hubxyz. 150 countries already collecting what frontier labs can no longer scrape. Audio, image, video, text. With consent. With provenance.
The scrapers will keep fighting the lawsuits. The licensing deals will keep getting more expensive.
And quietly, the companies that invested in real collection infrastructure will become the only pipe that still works.