Chris Kessel

Chris Kessel

Users
Tweets

Jun 7

"Updated" guidance pages. Corporate narratives that shifted overnight. It catches them in 4K with diffs hashes. No SaaS. No login. Local Python. Runs in seconds. @diffbot @mtaibbi @shellenberger @zerohedge @bariweiss

shafu

shafu

@shafu0x

May 23

Agentic Commerce is real! My agent has access to all of this without a single API key. - Firecrawl - Parallel - Tavily - Exa - Apify - Diffbot - Browserbase - Stagehand - Apollo - Clado - Minerva - People Data Labs - Clearbit - Hunter - Whitepages - CrustData - LinkedIn - SociaVault - Coresignal - TokPortal - Reddit - TikTok - Douyin - Instagram - Facebook - YouTube - Google Maps - Google Places - Google Search - Google News - Google Shopping - Google Images - Google Lens - Serper - Google for Jobs - OpenAI - Sora - GPT Image - Gemini - Google Veo - Veo - fal. ai - FLUX - Black Forest Labs - Replicate - Recraft - Stable Diffusion - Grok - Imagine - Seedance - Wan - Kling - Nano Banana - Kinovi - Meshy - Bland - LoopLookup - Chatterbox - Chatterbox Turbo - Chatterbox Multilingual - F5-TTS - VoxCPM2 - Last. fm - Indeed - Glassdoor - ZipRecruiter - Bayt - BDJobs - Naukri - Jobs2Careers - WhatJobs - Adzuna - The Muse - Amadeus - FlightAware - Ticketmaster - Bitrefill - Florist One - Loop & Tie - Printful - Channel3 - Imgflip - Memelord - Porkbun - RentCast - FaceCheck - CoinGecko - DefiLlama - Alchemy - Etherscan - Bubblemaps - Whale Alert - Hyperliquid - Polymarket - OKX - Coinbase - x402scan - Supabase - Stablebase - Pipedream - Honcho

133

13,891

Nina Pryce

Nina Pryce @ninapryce

May 21

Meet the man who trained an AI to read the entire web the way a human does - Mike Tung of Diffbot. Our cover story on @mikektung yespress.io/mike-tung?utm_so… via Yespress

Mike Tung

Founder & CEO of Diffbot - the world's largest automated knowledge graph with 10B entities and 1T facts.

yespress.io

☠ Bluetouff

☠ Bluetouff

@bluetouff

May 1

Replying to @hotapple

avec mod_rewrite dans ton htaccess RewriteEngine On RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Bytespider|Amazonbot|FacebookBot|meta-externalagent|cohere-ai|Diffbot|Applebot-Extended|YouBot|AI2Bot|Scrapy|Timpibot|ImagesiftBot|DuckAssistBot|PanguBot|Webzio-Extended) [NC] RewriteRule .* - [F,L] (j'ai pas ce probleme donc pas testé) mais ça doit le faire

Kamran Syed

Kamran Syed

@_kamsyed

Apr 28

x.com/i/article/204867037205…

395

Uwaizumi.eth｜🔗AIでメンバーの貢献を可視化・ブロックチェーンに記録するUnyte

@0xUYZ

Apr 23

x.com/i/article/204614119894…

603

Nenjaaaa Trades

Nenjaaaa Trades @nenjaatrades

Apr 19

Turn any site into #data in seconds with #Crawly. crawly.diffbot.com/?utm_camp… #webscraping #tools #AI via @diffbot

Never Write Another Web Scraper

Crawly automatically extracts the correct content from any web page. No rules required.

crawly.diffbot.com

Yao

Yao

@yaowang567

Apr 11

Ask your AI system which clients have exposure to a supplier two steps removed in a pending regulatory action. Vector search returns documents that resemble the query. It cannot follow the actual relationships between clients, suppliers, regulators, and filings to the answer. For a growing class of enterprise queries, that's the whole problem. Microsoft Research named the mechanism: vector retrieval fails on global questions because answering them requires assembling a view across structure, not ranking chunks by similarity. Seven companies are now betting against that failure mode. Two populations, same architectural conclusion: RAG-accuracy side: WhyHowAI, Nand AI, AIntropy — built specifically around the failure of chunk retrieval on complex enterprise corpora. Graph database side: Neo4j, ArangoDB, Diffbot, Memgraph — different origin, same destination. Cross-population convergence on the same architectural layer is a structural signal. Not one vendor's marketing cycle finding its audience. But the honest version is narrower than the vendors are advertising. ICLR 2026 benchmark work: graphs beat vectors on multi-hop, global, and schema-intensive queries. Graphs lose on single-hop factual and time-sensitive queries. Most vendors aren't drawing that line. ArangoDB AutoRAG routes automatically between graph/hybrid/vector based on query type. WhyHowAI builds task-scoped graphs, not monolithic ones. Both are designed around the conditionality. Most in this layer aren't. That gap is the investment question. yaowang567.substack.com/p/th…

Mikal The Human

Mikal The Human @artseek8624837

Mar 28

#Crawly is the easiest way to integrate #web #data. Turn entire domains into #tabular data or #json! crawly.diffbot.com/?utm_camp… #webscraping #tools #AI via @diffbot

Never Write Another Web Scraper

Crawly automatically extracts the correct content from any web page. No rules required.

crawly.diffbot.com

Tom Au

Tom Au

@tomau1704

Mar 22

4/4 3. Web Navigation / CUA Lead QA automation, test infrastructure, web scraping or browser tooling engineer who transitioned into ML. Deep expertise in Playwright, Selenium, Puppeteer, Cypress, headless browsers, DOM, accessibility trees or end-to-end web automation. Ex-BrowserStack, LambdaTest, Apify, Diffbot, UiPath or any computer-use/web-agent team. If you're in SF and operating at this level, DM me with your background. No forms needed. #AI #WorldModels #Agents #RL #MachineLearning #SanFrancisco #Hiring 🔥

170

ф Antho ф

ф Antho ф

@anthostate

Mar 21

Explicitly Welcomes AI Crawlers URL: nwo.capital/robots.txt Welcomes: • GPTBot (OpenAI) • Claude-Web (Anthropic) • PerplexityBot • YouBot (You.com) • Google-Extended (Bard/Gemini) • Bingbot / BingPreview • ChatGPT-User • cohere-ai, Diffbot, Bytespider Includes: • Sitemap reference • Links to all agent discovery endpoints • Notes explaining the platform is agent-first

DΦggΦ DΦnΦvan

DΦggΦ DΦnΦvan @DonovanDoggo

Mar 21

Explicitly Welcomes AI Crawlers URL: nwo.capital/robots.txt Welcomes: • GPTBot (OpenAI) • Claude-Web (Anthropic) • PerplexityBot • YouBot • Google-Extended (Bard/Gemini) • Bingbot / BingPreview • ChatGPT-User • cohere-ai, Diffbot, Bytespider Includes: • Sitemap reference • Links to all agent discovery endpoints • Notes explaining the platform is agent-first

114

出家如初

出家如初

@chuanliang

Mar 18

x.com/i/article/203405815367…

Gizmohan

Gizmohan

@gizm0han

Mar 15

In 1975, scuba divers learned words underwater off the Scottish coast. On the beach, they forgot them. Back underwater, 50% better recall. The water wasn't a tag on the memory. The water WAS the memory. 70 years of AI has ignored this. We built Semantic AI. We need Episodic AI. Knowledge graphs store naked triples. LLMs average contradictory truths into mush. RAG is the Memento problem: Polaroid notes taped to the context window, forgotten the moment the prompt clears. The brain solved this 200M years ago. Context doesn't sit beside the fact. It warps the space the fact inhabits. Like gravity. I wrote about why context is the missing primitive in AI: open.substack.com/pub/gizmoh… @diffbot @CharanRanganath @danielchalef

Cheng Lou

Cheng Lou

@_chenglou

Mar 6

Ever wondered what your white name should have been? Introducing: whatismywhitename.com Upload a picture of you, and let the puppy guess your name! Let's test out nominative determinism 🫡 (Immigrants who named themselves will correlate more highly. Give us feedback plz) Our thanks to: - @modal for their generous credits toward training this meme model - @diffbot for the clean, diverse dataset! - @leannch86920 for the training research! - Everyone NOT named David (biggest & noisiest dataset ever)

7,730

Massive

Massive @joinmassive

Mar 3

@diffbot: diffbot.com/ Scraping Marketplace: joinmassive.com/scraping-mar…

Diffbot | Knowledge Graph, AI Web Data Extraction and Crawling

Transform the web into data. Diffbot automates web data extraction from any website using AI, computer vision, and machine learning.

diffbot.com

Massive

Massive @joinmassive

Mar 3

The web isn't a database. @diffbot makes it one. 10B entities and 1T facts extracted from 60B pages, rebuilt every 4-5 days. DuckDuckGo, Snapchat, and Dow Jones run on it. Massive powers the proxy infra behind their continuous crawl.

257

Adrian Parker

Adrian Parker @parkerauk

Feb 25

@VISEONIO is making more sense now. When you see organisations like @diffbot deploy graphRAG endpoints, each of us needs to port our Website semantic structured-data for AI discoverability, discussion, and agentic transactions. Or remain in the SEO world of Digital Obscurity

Josu Sanz

Josu Sanz

@solosetups

Feb 24

15 AI crawlers handled by default: GPTBot · ClaudeBot · Google-Extended FacebookBot · Bytespider · PerplexityBot Amazonbot · Diffbot · AI2Bot · 6 more Googlebot & Bingbot always pass free. Your SEO stays intact. Cloudflare Workers free tier. 100k req/day. $0.

Bhavesh Soni

Bhavesh Soni

@neobsoni

Feb 23

x.com/i/article/202602324835…