Filter
Exclude
Time range
-
Near
"Updated" guidance pages. Corporate narratives that shifted overnight. It catches them in 4K with diffs hashes. No SaaS. No login. Local Python. Runs in seconds. @diffbot @mtaibbi @shellenberger @zerohedge @bariweiss
1
29
May 23
Agentic Commerce is real! My agent has access to all of this without a single API key. - Firecrawl - Parallel - Tavily - Exa - Apify - Diffbot - Browserbase - Stagehand - Apollo - Clado - Minerva - People Data Labs - Clearbit - Hunter - Whitepages - CrustData - LinkedIn - SociaVault - Coresignal - TokPortal - Reddit - TikTok - Douyin - Instagram - Facebook - YouTube - Google Maps - Google Places - Google Search - Google News - Google Shopping - Google Images - Google Lens - Serper - Google for Jobs - OpenAI - Sora - GPT Image - Gemini - Google Veo - Veo - fal. ai - FLUX - Black Forest Labs - Replicate - Recraft - Stable Diffusion - Grok - Imagine - Seedance - Wan - Kling - Nano Banana - Kinovi - Meshy - Bland - LoopLookup - Chatterbox - Chatterbox Turbo - Chatterbox Multilingual - F5-TTS - VoxCPM2 - Last. fm - Indeed - Glassdoor - ZipRecruiter - Bayt - BDJobs - Naukri - Jobs2Careers - WhatJobs - Adzuna - The Muse - Amadeus - FlightAware - Ticketmaster - Bitrefill - Florist One - Loop & Tie - Printful - Channel3 - Imgflip - Memelord - Porkbun - RentCast - FaceCheck - CoinGecko - DefiLlama - Alchemy - Etherscan - Bubblemaps - Whale Alert - Hyperliquid - Polymarket - OKX - Coinbase - x402scan - Supabase - Stablebase - Pipedream - Honcho
25
7
133
13,891
Meet the man who trained an AI to read the entire web the way a human does - Mike Tung of Diffbot. Our cover story on @mikektung yespress.io/mike-tung?utm_so… via Yespress
2
Replying to @hotapple
avec mod_rewrite dans ton htaccess RewriteEngine On RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Bytespider|Amazonbot|FacebookBot|meta-externalagent|cohere-ai|Diffbot|Applebot-Extended|YouBot|AI2Bot|Scrapy|Timpibot|ImagesiftBot|DuckAssistBot|PanguBot|Webzio-Extended) [NC] RewriteRule .* - [F,L] (j'ai pas ce probleme donc pas testé) mais ça doit le faire
1
43
Apr 11
Ask your AI system which clients have exposure to a supplier two steps removed in a pending regulatory action. Vector search returns documents that resemble the query. It cannot follow the actual relationships between clients, suppliers, regulators, and filings to the answer. For a growing class of enterprise queries, that's the whole problem. Microsoft Research named the mechanism: vector retrieval fails on global questions because answering them requires assembling a view across structure, not ranking chunks by similarity. Seven companies are now betting against that failure mode. Two populations, same architectural conclusion: RAG-accuracy side: WhyHowAI, Nand AI, AIntropy — built specifically around the failure of chunk retrieval on complex enterprise corpora. Graph database side: Neo4j, ArangoDB, Diffbot, Memgraph — different origin, same destination. Cross-population convergence on the same architectural layer is a structural signal. Not one vendor's marketing cycle finding its audience. But the honest version is narrower than the vendors are advertising. ICLR 2026 benchmark work: graphs beat vectors on multi-hop, global, and schema-intensive queries. Graphs lose on single-hop factual and time-sensitive queries. Most vendors aren't drawing that line. ArangoDB AutoRAG routes automatically between graph/hybrid/vector based on query type. WhyHowAI builds task-scoped graphs, not monolithic ones. Both are designed around the conditionality. Most in this layer aren't. That gap is the investment question. yaowang567.substack.com/p/th…

16
4/4 3. Web Navigation / CUA Lead QA automation, test infrastructure, web scraping or browser tooling engineer who transitioned into ML. Deep expertise in Playwright, Selenium, Puppeteer, Cypress, headless browsers, DOM, accessibility trees or end-to-end web automation. Ex-BrowserStack, LambdaTest, Apify, Diffbot, UiPath or any computer-use/web-agent team. If you're in SF and operating at this level, DM me with your background. No forms needed. #AI #WorldModels #Agents #RL #MachineLearning #SanFrancisco #Hiring 🔥
1
1
170
Explicitly Welcomes AI Crawlers URL: nwo.capital/robots.txt Welcomes: • GPTBot (OpenAI) • Claude-Web (Anthropic) • PerplexityBot • YouBot (You.com) • Google-Extended (Bard/Gemini) • Bingbot / BingPreview • ChatGPT-User • cohere-ai, Diffbot, Bytespider Includes: • Sitemap reference • Links to all agent discovery endpoints • Notes explaining the platform is agent-first

1
51
Explicitly Welcomes AI Crawlers URL: nwo.capital/robots.txt Welcomes: • GPTBot (OpenAI) • Claude-Web (Anthropic) • PerplexityBot • YouBot • Google-Extended (Bard/Gemini) • Bingbot / BingPreview • ChatGPT-User • cohere-ai, Diffbot, Bytespider Includes: • Sitemap reference • Links to all agent discovery endpoints • Notes explaining the platform is agent-first

1
1
4
114
In 1975, scuba divers learned words underwater off the Scottish coast. On the beach, they forgot them. Back underwater, 50% better recall. The water wasn't a tag on the memory. The water WAS the memory. 70 years of AI has ignored this. We built Semantic AI. We need Episodic AI. Knowledge graphs store naked triples. LLMs average contradictory truths into mush. RAG is the Memento problem: Polaroid notes taped to the context window, forgotten the moment the prompt clears. The brain solved this 200M years ago. Context doesn't sit beside the fact. It warps the space the fact inhabits. Like gravity. I wrote about why context is the missing primitive in AI: open.substack.com/pub/gizmoh… @diffbot @CharanRanganath @danielchalef

1
49
Ever wondered what your white name should have been? Introducing: whatismywhitename.com Upload a picture of you, and let the puppy guess your name! Let's test out nominative determinism 🫡 (Immigrants who named themselves will correlate more highly. Give us feedback plz) Our thanks to: - @modal for their generous credits toward training this meme model - @diffbot for the clean, diverse dataset! - @leannch86920 for the training research! - Everyone NOT named David (biggest & noisiest dataset ever)

7
8
62
7,730
The web isn't a database. @diffbot makes it one. 10B entities and 1T facts extracted from 60B pages, rebuilt every 4-5 days. DuckDuckGo, Snapchat, and Dow Jones run on it. Massive powers the proxy infra behind their continuous crawl.
1
1
3
257
@VISEONIO is making more sense now. When you see organisations like @diffbot deploy graphRAG endpoints, each of us needs to port our Website semantic structured-data for AI discoverability, discussion, and agentic transactions. Or remain in the SEO world of Digital Obscurity
13
15 AI crawlers handled by default: GPTBot · ClaudeBot · Google-Extended FacebookBot · Bytespider · PerplexityBot Amazonbot · Diffbot · AI2Bot · 6 more Googlebot & Bingbot always pass free. Your SEO stays intact. Cloudflare Workers free tier. 100k req/day. $0.
1
61