People are completely missing the point of this feature. Most accidentally and probably some on purpose.
They stated very clearly that this obeys all blocking and content rules for your site. It’s literally the opposite of bypassing those controls.
The purpose of the system is to get rid of all the halfassed AI crawlers all over the Internet that are doing a crappy job of pulling your content.
They are loud, rude, and wasteful. And they’re filling the internet and all our logs with massive amounts background noise.
Cloudflare knows they are going to crawl no matter what.
This feature is simply giving them a legit way to do it efficiently so that they don’t clog up the entire Internet doing it in a way that it’s 1000x less efficient.
And it’s still uses all your rules for what can and cannot be crawled. And all of your other Cloudflare controls around crawling are still enforced.
If this is adopted to any significant degree, Cloudflare will be absolute heroes for reducing terabytes of crawler-slop background noise.
Introducing the new /crawl endpoint - one API call and an entire site crawled.
No scripts. No browser management. Just the content in HTML, Markdown, or JSON.