Filter
Exclude
Time range
-
Near
ðŸ•·ï¸ Scrapling tip: Fetcher.configure() sets parser flags GLOBALLY — selector_config={...} is the per-REQUEST override Flip keep_comments/keep_cdata/huge_tree/adaptive for ONE fetch without touching the rest of your crawl âš¡
20
Replying to @nakadai_mon
i still feel like a baby when it comes to ai. cause i thought i had worked out fabrication and lying. bro, it's telling my scrapling and flaresolverr are the same thing in different ports. I'm like bullshit. scrapling is a skill!
1
6
var d = "M20.396 11c-.018-.646-.215-1.275-.57-1.81 retweeted
El web scraping acaba de cambiar de nivel Scrapling evita los bloqueos de Cloudflare, es 774 veces más rápido que BeautifulSoup y no necesita configuración de proxies 52.2k estrellas en GitHub No es otro scraper más Es un framework adaptativo que aprende la estructura de cada web y se ajusta automáticamente cuando cambia Sin mantenimiento manual. Sin que te bloqueen. ✅ Bypassa Cloudflare y los anti-bots más agresivos ✅ 774x más rápido que BeautifulSoup en benchmarks reales ✅ Sin necesidad de proxies ni configuración especial ✅ Se adapta automáticamente cuando cambia la estructura de la web ✅ Compatible con agentes de IA como servidor MCP ✅ Soporte para JavaScript, iframes y contenido dinámico ✅ Modo stealth para webs con detección avanzada ✅ 46 releases. Actualizado la semana pasada. ✅ Licencia BSD-3 Lo que antes tardabas días en montar y mantener ahora son minutos 52.2k estrellas. 5k forks. BSD-3. repo aquí 👇
23
257
1,865
119,069
Replying to @hanifproduktif
Caranya: pas prompt bilang "no halusinasi". Atau pas mulai riset pake tech stack yg lebih deterministik buat scrape sumber, kaya scrapling, tavily, atau notebooklm. Kadang kalau bermodalkan websearch, masih suka halu
33
Who has used this with their Hermes or Openclaw Agents? How does it compare to scrapling or brave api etc. What specific use cases has it excelled at. Deciding if I should add it into the tool stack.
Starting today, you can try Firecrawl for free without an API key 🔥 Search, scrape, and interact with any web page, plus parse any PDF into clean markdown, with no setup at all! Start using our endpoints and only sign up when you scale. Live on our MCP, CLI, and API now!
3
254
ðŸ•·ï¸ Scrapling tip: google_search=True is the DEFAULT on StealthyFetcher/DynamicFetcher — every fetch ships with a Google referer header Mimics organic search traffic many WAFs give more leeway to Set google_search=False to drop it (or set your own Referer) âš¡
49
The DC Strategist retweeted
Someone just built a web scraper that survives when websites change. It’s called Scrapling. 47K stars on GitHub. An adaptive Python scraping framework that handles everything from one request to full-scale crawls. The crazy part: Its parser learns from website changes and automatically finds your elements again when the page updates. That means fewer broken selectors. Fewer dead scraping scripts. Fewer “why did this stop working?†mornings. What it gives you: - Adaptive parsing - Auto element relocation - Cloudflare Turnstile bypass - Browser-like fetchers - Concurrent crawls - Multi-session crawling - Pause and resume - Automatic proxy rotation - CLI support - MCP mode Most scrapers are fragile. You write selectors. The website changes a class name. Your pipeline dies. Then you spend half the day fixing something that should have kept working. Scrapling is built for the modern web. Pages change. Anti-bot systems evolve. Crawls get bigger. Agents need live data. So instead of stitching together Requests, BeautifulSoup, Playwright, proxy tools, retry logic, and anti-bot patches... Scrapling puts the whole scraping workflow into one Python library. This is not just a scraper. It’s a survival kit for web data. GitHub: github.com/D4Vinci/Scrapling
4
5
18
1,129
Claude Codeã«ã€ŒScraplingã€ã¨ã„ã†å°‚ç”¨ã®æ­¦å™¨ã‚’ã‚‚ãŸã›ã‚‹é€£æºã‚¬ã‚¤ãƒ‰ã§ã™ã€‚å°Žå…¥ã¯è©±ã—ã‹ã‘ã‚‹ã ã‘ã§ä¸€çž¬ã§çµ‚ã‚りã¾ã™ã€‚ AIæ´»ç”¨ã®æ¬¡å…ƒã‚’一段上ã’ãŸã„æ–¹ã¯ã“ã¡ã‚‰ã‹ã‚‰ã©ã†ãžã€‚ x.com/ceo_comix/status/20345…"

1
301
Jun 16
Replying to @tom_doerr
ive testing the project for quite sometimes - difficult to use this in commercial sense, the scraping logic is somewhat brittle and relies on standard methods. website with heavy security do not work with Scrapling
8
ðŸ•·ï¸ Scrapling tip: http3=True flips Fetcher to HTTP/3 (QUIC over UDP) page = Fetcher.get(url, http3=True) Most scrapers still send HTTP/1.1 — going HTTP/3 blends in with modern browser traffic. Gotcha: may conflict with impersonate= âš¡
1
50
We tested Puppeteer, Firecrawl, Scrapling, and a few fetch variants. Scrapling looked best on memory usage, so we picked it for the first run. Then the real problems showed up.
4
ðŸ•·ï¸ Scrapling tip: spider dedup ignores headers by default — same URL with different Authorization / X-API-Key tokens silently collapses to one class MySpider(Spider): fp_include_headers = True Header keys are case-normalized in the fingerprint âš¡
2
74
Weekend benchmark: scraping layer. ~17k domains. 2 to 5 min per scrape. Tested Puppeteer, Firecrawl, Scrapling. Judged on cost, CPU/RAM, failure rate. Picked Scrapling first. Memory usage was the deciding factor. Next: mass concurrency. If you scale this, what breaks first?
1
51
Thank you anon Scrapling!! :D

ALT Yay Yeah GIF

1
4
84
D4Vinci/Scraplingã€READMEã®èª¬æ˜ŽãŒä¸å¯§ã§åˆè¦‹ã§ã‚‚æµã‚Œã‚’追ã„ã‚„ã™ã„。 ★56.1k Python製 èƒŒæ™¯ã¯ ðŸ•·ï¸ An adaptive Web Scraping framework that handles everything from a single request to a…。READMEã¾ã§è¿½ã†ã¨ã€è©¦ã™å„ªå…ˆé †ä½ãŒã¯ã£ãりã™ã‚‹ã€‚ READMEã§ã¯ã€ŒBased on the provided repository information, I have su…ã€ãŒå…ˆã«æ•´ç†ã•れã¦ã„ã¦ã€ç‹™ã„を掴ã¿ã‚„ã™ã„。 次ã®è¦ç‚¹ã€ŒðŸ”‹ What it solves: Scrapling is an effortless web sc…ã€ã¾ã§è§¦ã‚Œã¦ã„ã‚‹ã®ã§ã€å®Ÿè£…ã®è¿·ã„ãŒæ¸›ã‚‹ã€‚ ã•らã«ã€ŒðŸ”§ How to get started: The project provides a Docker…ã€ã¾ã§èª­ã‚€ã¨ã€é‹ç”¨æ™‚ã®å‹˜æ‰€ã‚‚見ãˆã¦ãる。 ★56.1k #Python #GitHub #OSS 注目キーワード: scrapling / Docker / Ready / automatically github.com/d4vinci/scrapling
1
1
36
â‘£ ãƒ¡ãƒ«ã‚«ãƒªï¼æ ªä¾¡ï¼é€šè²©ã‚’24h 自動監視(scrapling) æ¡ä»¶é”æˆã—ãŸã‚‰è‡ªå‹•ã§é€šçŸ¥ã€‚ æ™®æ®µï¼šã€Œãƒ¡ãƒ«ã‚«ãƒªã§æ¬²ã—ã„商å“㌠¥5,000 以下ã§å‡ºãŸã‚‰æ¬²ã—ã„ã€â†’ 1æ—¥3回 サイト見ã«è¡Œã Hermes ã ã¨ï¼š ・メルカリï¼ãƒ¤ãƒ•オク 価格監視(指定é¡ä»¥ä¸‹ã§ LINE通知) ・株価・仮想通貨価格を毎時間 Google Sheet 記録 ・Amazonï¼æ¥½å¤©ï¼Apple 新商å“アラート install: hermes skills install github:amanning3390/hermeshub/skills/scrapling
1
177
Scrapling is an adaptive web scraping framework that handles everything from a single request to a full-scale crawl.
1
25
🚨CloakBrowser ve CAPTCHA GüvenliÄŸi Tamamen Açık Kaynak ve DeÅŸifre Oldu! Web Otomasyonu ve Kazıma İçin Yeni Standart GeliyorðŸ”🤖 Bu haberde web scraping, otomasyon ve headless browser kullanan herkesin dikkatine diyoruz... CloakBrowser ekibi, güçlü stealth tarayıcılarını ve entegre CAPTCHA çözücüsünü açık kaynak haline getirdi! 1 - CloakBrowser Nedir? CloakBrowser, Chromium’un kaynak kodunu (C seviyesinde) 58 patch ile modifiye edilmiÅŸ özel bir stealth tarayıcı. JS injection veya basit fingerprint spoofing yöntemlerine dayanmıyor. Tarayıcı baÅŸtan sona “gerçek bir kullanıcı Chrome’u†gibi davranıyor. 2 - Ana Özellikler: ✅reCAPTCHA v3 puanı: 0.9 (insan seviyesinde, server-side doÄŸrulanmış) ✅Cloudflare Turnstile’ı otomatik geçiyor (managed non-interactive) ✅FingerprintJS, BrowserScan, ShieldSquare, bot.incolumitas.com gibi 30 tespit sitesini baÅŸarıyla geçiyor ✅Playwright ile drop-in replacement — sadece import’u deÄŸiÅŸtiriyorsun, kodunun geri kalanı aynı kalıyor ✅Humanize modu: Gerçekçi mouse hareketleri (Bézier eÄŸrileri), yazma hızı, scroll pattern’leri ✅Proxy GeoIP desteÄŸi (proxy IP’sine göre timezone/locale otomatik ayarlanıyor) ✅Persistent profil desteÄŸi (cookies, localStorage, extension’lar korunuyor) ✅Docker ve CDP server desteÄŸi (cloakserve ile birden fazla parmak izi yönetebiliyorsun) ✅Widevine/DRM desteÄŸi (Linux’ta) 3 - Nasıl Çalışıyor? Chromium 146 (ve bazı platformlarda 145) kaynağına doÄŸrudan patch’ler uygulanmış. Canvas, WebGL, audio, font, GPU, WebRTC, CDP sinyalleri, navigator.webdriver, TLS fingerprint (JA3/JA4) gibi her ÅŸey kaynak seviyesinde deÄŸiÅŸtiriliyor. Bu sayede tespit araçları “gerçek tarayıcı†görüyor. 4 - CAPTCHA Çözücüsü (Entegrasyon Katmanı) CloakBrowser fallback sistemi: Katman 1 (Ücretsiz): CloakBrowser ile CAPTCHA’ların € ’ını tamamen engelliyor Katman 2: Cloudflare Turnstile için ücretsiz otomatik tıklama Katman 3: Kalan için 2Captcha ve CapSolver entegrasyonu (30 CAPTCHA türü destekliyor: reCAPTCHA v2/v3/Enterprise, hCaptcha, FunCaptcha, GeeTest, KeyCaptcha, Amazon WAF, DataDome, Akamai, Imperva vb.) Bu yapı sayesinde hem maliyet düşük kalıyor hem de baÅŸarı oranı çok yüksek. 5 - Benchmark ve Test Sonuçları ✅reCAPTCHA v3: Stock Playwright → 0.1 | CloakBrowser → 0.9 ✅Cloudflare Turnstile: Stock → FAIL | Cloak → PASS 14 ana tespit testinin tamamını geçiyor Gerçek sitelerde (Google, LinkedIn, Discord vb.) baÅŸarılı CAPTCHA bypass’ları rapor edilmiÅŸ 6 - Kurulum ve Kullanım (Çok Basit) Python ile Playwright: pip install cloakbrowser python from cloakbrowser import launch browser = launch(headless=False, humanize=True, proxy="http://user:pass@ip:port", geoip=True) page = browser.new_page() page.goto("protected-site.com") # ... iÅŸlemler JavaScript/Node.js ve Docker desteÄŸi de var. Mevcut Playwright projelerini tek satırda CloakBrowser’a taşıyabiliyorsun. 7 - Neden Bu Kadar Önemli? Piyasadaki çoÄŸu stealth tarayıcı (Undetected-Chromedriver, puppeteer-extra vs.) JS seviyesinde çalışıyor ve Chromium güncellemeleriyle kolayca tespit edilebiliyor. CloakBrowser ise kaynak kod seviyesinde düzeltme yaptığı için çok daha dayanıklı. 8 - Açık kaynak olmasıyla: ✅Ücretsiz kullanabiliyorsun ✅Kendi patch’lerini ekleyebiliyorsun ✅Topluluk katkısıyla sürekli geliÅŸiyor ✅Ticari çözümlerin (ayda yüzlerce dolar) çok daha ✅uygun fiyatlı alternatifi Repo’lar aktif geliÅŸtiriliyor, örnekler bol (LangChain, Crawl4AI, Scrapling entegrasyonları dahil). Binary’ler SHA-256 doÄŸrulamalı ve Sigstore ile imzalanmış. Linux, macOS (Apple Silicon dahil) ve Windows destekleniyor. Web kazıma, otomasyon, veri toplama, test otomasyonu veya AI agent’lar için tarayıcı kullanan herkes için gerçek bir game-changer. Özellikle yüksek hacimli ve tespit riski yüksek projelerde fark yaratıyor. Sizce bu tür açık kaynak stealth araçlar, web scraping dünyasını nasıl deÄŸiÅŸtirecek? CloakBrowser’ı denediniz mi? Hangi CAPTCHA türleriyle karşılaşıyorsunuz ve baÅŸarı oranınız nasıl? Yorumlarda deneyimlerinizi paylaşın, birlikte tartışalım! ðŸ”🤖
1
2
10
2,202
ðŸ•·ï¸ Scrapling tip: SitemapSpider IGNORES hreflang URLs by default Flip sitemap_alternate_links=True to also dispatch every <xhtml:link rel="alternate"> URL through your rules — crawl every locale of a multilingual site in one go âš¡
3
1
85