Filter
Exclude
Time range
-
Near
Every business has documents. Contracts. Insurance policies. Research papers. Medical records. Financial reports. Legal filings. Unstructured text holding critical information that nobody can query, search, or analyze at scale. Google open sourced the tool that changes that. @GoogleAI — 26,300 GitHub stars, Apache 2.0, and the cleanest approach to structured extraction from unstructured text available today. Here's what a few lines of Python actually give you: → Define what you want to extract using few-shot examples — no complex prompt engineering required → Precise source grounding — every extracted entity mapped back to its exact character offset in the original text → Interactive HTML visualization — highlight exactly what was extracted and where it came from → Reliable structured outputs — schema-enforced via Controlled Generation in supported models → Multi-pass extraction — higher recall on long documents → Parallel processing — handles large documents efficiently → Gemini, OpenAI, and local models via Ollama — your choice → CJK language support — Chinese, Japanese, Korean (v1.1.1) → Community provider plugins — extend to any model backend → MCP server available — plug directly into AI agent workflows → Works across: legal, medical, finance, research, compliance → 1,800 forks, Apache 2.0 licensed Extract contract terms. Pull insurance clauses. Index research findings. Parse medical reports. Query what was unqueryable. Discovered on OSSphere : ossphere.dev/google/langextr… What's the most painful document parsing problem your team has ever faced? Drop it below 👇 #LangExtract #OpenSource #AI #NLP #BuildInPublic #Google #StructuredData
2
33
Jun 12
What if SEO could implement itself—across your entire site—with just one prompt? This video breaks down how Airo AI Builder’s built‑in SEO Skill analyzes your codebase, detects what’s already in place, and automatically applies optimizations—no plugins, no manual work. From meta tags and structured data to sitemaps and social cards, everything is handled directly in your project while respecting your existing code patterns. ✅ Full-site SEO in one go ✅ Automated technical implementation ✅ Tailored to your business and codebase ✅ No technical expertise required 👉 Watch the full walkthrough: youtube.com/watch?v=G5yb1WSP… #SEO #AI #WebsiteOptimization #NoCode #DigitalMarketing #SmallBusiness #StructuredData #Automation #GoDaddy
4
1
2
1,475
I've moved all my local Obsidian content to the web on @cloudflare Workers, using the GitHub API, @tan_stack start, and Fumadocs Here's a rant about the problems I've faced: > tried fumadocs-ui with TanStack Start, DocsLayout expects Next.js router context, had to use undocumented fumadocs-ui/provider/tanstack wrapper > used import.meta.glob for vault .md files, Vite refuses to resolve symlinks, set preserveSymlinks AND server.fs.allow for parent dir > remark-math parses Full Loss: $$ as inline math start, $$Classification Loss: as closing delimiter, entire document between = one KaTeX block = red error cascade > wrote preprocessor to split $$ onto own lines, now indented list items break because $$ with 4 spaces = code block not math fence per CommonMark > closing $$ at 4-space indent doesn't close math block in micromark, requires ≤3 spaces, everything after becomes red KaTeX error > rehype-katex omits throwOnError from Options type via Omit<>, can't disable red error rendering, only strict: 'ignore' available > KaTeX chokes on unicode inside accidental math blocks: zero-width spaces (8203), smart quotes (8217), emojis ✅❌ all throw "No character metrics" warnings > renderMarkdown runs twice (SSR hydration), both execute full remark rehype KaTeX pipeline, client re-parses already-rendered HTML causing duplicate warnings > switched to pre-rendering KaTeX before remark, used <span> wrapper, remark parses span contents as markdown, closes at first inner </span> not outer > KaTeX outputs MathML <annotation> with raw LaTeX source, remark leaks annotation text into output showing {DETR} = \lambda{cls} as plain text > set output: 'html' to strip MathML, switched to <div> wrapper, remark treats as opaque HTML block — now <div> at column 0 breaks nested list context > deeply indented list items after <div> become code blocks, 4-tab indent (16 spaces) no longer recognized as list continuation after block element > tried <span style="display:block"> to keep inline context, KaTeX's nested spans still get fragmented by remark's inline HTML parser > tried custom <math-display> element with rehype plugin to replace post-parse, custom elements aren't in CommonMark type-6 block list but still break lists > TanStack Start splat routes need params._splat, not documented, found by reading api.trpc.$.tsx scaffold code > fumadocs-core PageTree.Item url field doesn't auto-encode spaces, manual `encodeURIComponent` needed for "Knowledge Index" → " " paths > wikilink resolution: Obsidian uses filename lookup not path, built filename→key index, then realized anchors need separate slugify matching heading ID generation > `[[Page#Section|Alias]]` wikilinks stripped to plain text initially, then added resolver but cross-vault links need vault context not available in renderer > Shiki highlighting in Workers: had to use JS regex engine not WASM, async codeToHtml inside sync processSync = Promise wrapper hell, moved to loader > `#Q question #A answer` flashcard syntax: regex eats newlines between tokens, had to use `[^\S\n] ` (horizontal whitespace only) to preserve structure > same-line `#Q text #A text` works, multi-line breaks, had to split first then group consecutive `#Q`/`#A` lines into `<div class="obsidian-qa">` blocks > hydration mismatch: fumadocs-ui RootProvider sets `className="dark"` on server, client detects different theme, added `suppressHydrationWarning` to `<html>` and `<body>` > Grammarly extension injects `data-gr-ext-installed` on body during hydration, triggers mismatch warnings, same `suppressHydrationWarning` fix > `flattenTree` from fumadocs-core expects `Node[]` not `PageTree.Root`, had to pass `tree.children` not `tree` > Obsidian image embeds image.png need conversion to standard markdown, built imageByFilename Map with both encoded and decoded keys for lookup > mermaid code blocks need client-side rendering, marked with .vault-mermaid class, lazy-loaded mermaid.js in useEffect after HTML set > scroll-to-anchor breaks on SPA navigation, hash exists before element rendered, added retry loop with 5 attempts × 100ms delay > anchor click handling: href="#section" needs preventDefault smooth scroll history.pushState, same-page vs cross-page detection via pathname comparison > Obsidian callouts > !note not standard markdown, need custom remark plugin or regex preprocessing, skipped for now > nested lists with mixed tabs/spaces: Obsidian uses 4-space equiv tabs, CommonMark interprets 4 spaces as code block in certain contexts > math inside list items: $$` must be ≤3 spaces from list item content column, Obsidian allows 0-indent which breaks list continuation > `processSync` can't handle async Shiki highlighting, had to make renderMarkdown async, moved call to route loader for SSR > KaTeX CSS `?url` import generates hashed asset path, works in dev but needed verification for Worker ASSETS binding in prod > search API uses `createSearchAPI("advanced")` with structuredData, had to strip markdown syntax for indexable text, regex soup for fenced blocks/links/math > slugifyAnchor normalization: Obsidian "IOU (Heading)" and "IOU(Heading)" both need same slug, added `\s \(` → `(` replacement before kebab-case > blank lines inside `$$ math blocks: Obsidian allows, standard CommonMark terminates block, wrote pass to normalize delimiter placement > \` at end of lines in LaTeX cases environment: \ (backslash space) vs \\ (line break), inconsistent source files cause KaTeX errors > styling broken via proxy: assets at `/fumadocs/assets/*` correctly proxied but Worker can't find files because path mismatch with actual build output location > pnpm v10 `ERR_PNPM_IGNORED_BUILDS` hard error: esbuild/sharp/workerd scripts blocked, moved `onlyBuiltDependencies` from `package.json#pnpm` to `pnpm-workspace.yaml` > fumadocs-ui `.shiki:not(.not-fumadocs-codeblock *)` CSS matches ALL `.shiki` elements, applies padding/position to `.line` spans creating horizontal separator lines between code rows > Shiki `bundledThemes` not obvious, had to enumerate keys to find available dark themes > code span colors overridden: fumadocs-ui `code span { color: var(--shiki-light) }` rule wipes all inline token colors from Shiki output > RootProvider `search` prop API undocumented for TanStack, dug through `.d.ts` files to find `DefaultSearchDialogProps` interface > DocsLayout `tree` prop needs `PageTree.Root`, had to read bundled `definitions-Cob-Q8-8.d.ts` to understand Item/Folder/Separator structure > fumadocs-core loader accepts `VirtualFile` objects, could construct manually but easier to build PageTree directly from file paths > `createMarkdownRenderer` from fumadocs-core uses remark rehype internally, but outputs React component not HTML string, needed different approach > gray-matter and marked not needed: fumadocs-core has `content/md/frontmatter` and remark pipeline built-in, but as transitive deps not directly importable in pnpm > `fumadocs-core/mdx-plugins/remark-gfm` re-exports remarkGfm, can use without adding direct dependency > Cloudflare Worker can't use filesystem at runtime, all content must be bundled at build time via import.meta.glob eager loading > glob pattern `'../../content/**/*.md'` from src/lib needed symlink in place AND Vite fs.allow config for parent directory > `$vault.tsx` acts as layout needing `<Outlet />`, `$vault.index.tsx` is vault index, `$vault.$.tsx` is catch-all, file naming convention undocumented > TanStack Router basepath `/fumadocs` handles routing but Vite base affects asset URLs differently, needed both configured correctly > mermaid.initialize() must be called before mermaid.run(), but DOM not ready on hydration, race condition with useEffect timing > structuredData from search index includes raw markdown, needed regex to strip fenced blocks, wikilinks, math delimiters for clean search text

2
1
7
602
SEOAI New feature: Product Feed Product Feed lets you upload a product catalog, audit each SKU, enrich product data with AI, and export a cleaned feed for your sales channels. ❖ Supported inputs: ▶ CSV (generic, Shopify, WooCommerce) ▶ Google Merchant Center XML ▶ JSON Formats are auto-detected on upload. ❖ What it does: 1. Normalizes your feed into a single internal schema 2. Audits each product for common data issues (missing/invalid GTIN, short titles, thin descriptions, missing price, category, or image) 3. Enriches weak titles and descriptions with AI, infers missing categories, and assigns an AI readability score (0–100) per product 4. Exports the updated feed in the format you need ❖ Export formats ▶ Clean CSV ▶ schema.org Product JSON-LD ▶ Google Merchant XML ▶ Shopify CSV ▶ WooCommerce CSV ▶ Amazon flat-file ▶ Meta catalog CSV ❖ Availability Included on Platinum plans. Available as a paid add-on on other tiers. ❖ Limits Up to 1,000 products per upload. AI enrichment runs on the first 150 products per session. ❖ Enriched products can also be imported into SEOAI Product Pages (/p/your-brand/products/…) as individual AI-readable product pages under your brand profile. Find it in the dashboard under Setup → Product Feed at seoai.space #AIVisibility #Ecommerce #ProductData #GEO #SEOAI #StructuredData
1
6
124
Jun 10
SEO is about keywords. AI visibility is about schema markup — structured data that tells AI crawlers exactly what your product is, what it costs, and who it's for. Most travel suppliers have none. tixxly.ai #TravelTech #StructuredData
11
0% Core Web Vitals
0% Structured Data
0% Server-Side Rendering
100% AI Search Optimization
1 votes • Final results
1
16
We are gathered here today to mourn the loss of our dear friend, Exact Match Keywords. 🪦🥀 The old SEO game of stuffing the same phrase 50 times into your site? Officially buried in 2024. In the new AI-powered search era, keywords have been survived by: → Complex Nuance — AI actually understanding intent behind real conversational questions → Entity Graphs — Search engines seeing your business as a real-world thing (not just text) → Structured Data — The technical backbone that tells AI exactly who you are, what you do, and where you serve → Actual Expertise — Content so good AI wants to cite you as a source Welcome to the AI web, folks. If your marketing strategy is still stuck in 2018… it might be time for an upgrade. 🤖📉 What’s one thing you’re changing in your strategy as AI eats search? Drop it below 👇 Link in the bio 👆 . . . #AISEO #StructuredData #DigitalMarketing #LocalSEO #SEOtips #Marketing
2
1
5
81
You exist in two spaces now. Both require your attention. Intelligence is not inside the machine. It never was. #NeuralAwareness #EntityBuilding #AIVisibility #StructuredData
2
🌏What Is Data? 🌟Data refers to any information, facts or figures that can be recorded, stored and analysed. In marketing analytics, data acts as the raw material for generating insights and supporting informed decision-making. ☑️Data can be classified by structure into structured, unstructured and semi-structured data, and by origin into declared, observed and inferred data. 🔥Machine learning is increasingly important because it can identify patterns in large datasets and infer insights, predictions and recommendations at scale. #Data #marketingdigital #structureddata #unstructureddata #marketing
2
21
𝐌𝐨𝐬𝐭 𝐒𝐄𝐎 𝐭𝐞𝐚𝐦𝐬 𝐬𝐩𝐞𝐧𝐝 𝐡𝐨𝐮𝐫𝐬 𝐝𝐞𝐜𝐢𝐝𝐢𝐧𝐠 𝐰𝐡𝐢𝐜𝐡 𝐬𝐜𝐡𝐞𝐦𝐚 𝐦𝐚𝐫𝐤𝐮𝐩 𝐭𝐨 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭. But until now, there was no reliable way to see what the rest of the web was actually doing. That just changed. Google and Schema.org have introduced a new Public Usage Statistics Dataset, giving the industry access to real-world schema adoption trends across millions of websites. 𝐓𝐡𝐢𝐬 𝐦𝐞𝐚𝐧𝐬 𝐒𝐄𝐎 𝐩𝐫𝐨𝐟𝐞𝐬𝐬𝐢𝐨𝐧𝐚𝐥𝐬 𝐜𝐚𝐧 𝐧𝐨𝐰: • See which schema types are widely adopted • Identify emerging structured data opportunities • Make more informed implementation decisions • Support SEO, AEO, and GEO strategies with actual usage data The real value isn't in copying what everyone else is doing. It's in spotting trends early. As AI-powered search continues to evolve, structured data is becoming an increasingly important part of how search engines and AI systems understand content, entities, products, services, and organizations. The teams that pay attention to these signals today will be better positioned for tomorrow's search landscape. Are you already using structured data as part of your SEO strategy? #SEO #TechnicalSEO #SchemaMarkup #StructuredData #GoogleSearch #AEO #GEO #AISEO #SearchMarketing #DigitalStrategy #naxtre
19
Your brand can rank on Google and still be invisible to AI. That is the uncomfortable gap between traditional SEO and Generative Engine Optimization. SEO helps you get found. GEO helps your content get understood, extracted, attributed, and cited by AI systems. The fix is not "publish more." The fix is structure: - clear definitions - direct answers - strong entity signals - FAQ-ready extraction points - server-rendered JSON-LD that is visible in raw HTML That last detail matters. If schema only appears after JavaScript runs, some AI crawlers may never see the signal you thought you had. This is why I treat GEO as a data architecture problem, not a marketing trick. The same discipline that makes a page AI-citable is the discipline that makes a business system scalable: clean data, clear relationships, and structured flows. Question worth auditing: Can a machine understand what your brand is the authority on? #GEO #AIStrategy #ContentArchitecture #StructuredData #JSONLD #BrandVisibility
3
2
142
Search engines don’t just read websites anymore — they understand meaning. Learn how Structured Data (Schema.org) improves SEO, rich results, and Google visibility. Read more 👇 amtechco-llc.com/en/blog/wha… #SEO #SchemaOrg #StructuredData #WebDev
3
2
3
40
🚨 DATABRICKS CEO JUST SAID THE QUIET PART OUT LOUD — AI IS NOTHING WITHOUT STRUCTURED DATA ♾️ The AI hype machine keeps screaming about agents, automation, copilots, AGI, and replacing workers. But Databricks CEO Ali Ghodsi has pointed to the real foundation: Systems of record. Structured data. Enterprise-grade information. Governed infrastructure. His point was simple: Why would companies move their system of record? It is hard to move it. That is the truth most AI hype merchants avoid. AI is not magic. AI needs: • clean data • structured data • trusted records • secure access • governed execution • reliable compute • auditability • identity • storage • operational control Without that, AI is just a clever interface sitting on top of weak infrastructure. And this is where $ICP by @dfinity becomes impossible to ignore. Most AI systems today still depend on the old stack: • centralized cloud • centralized databases • SaaS platforms • API keys • third-party infrastructure • opaque backend systems • off-chain storage • external identity providers • fragile data pipelines That is not sovereign AI. That is rented AI on someone else’s infrastructure. $ICP is different. ICP is building an onchain cloud where applications can run with: • backend logic onchain • frontend hosting onchain • data storage in canisters • Internet Identity • tamper-resistant execution • reverse gas model • chain-key cryptography • smart contracts that can serve web content directly • AI and agent infrastructure moving toward verifiable compute This is why the structured data argument matters. The future of AI will not only be about who has the biggest model. It will be about who controls: • the data • the execution layer • the identity layer • the application layer • the governance layer • the trust layer Databricks is right to focus on data foundations. But the next question is bigger: Where should the future AI data layer actually live? On centralized cloud? Inside closed SaaS silos? Behind API keys that can be leaked? Across systems users cannot verify? Or on tamper-resistant, programmable, internet-scale infrastructure? That is the opening for $ICP. AI needs structured data. Structured data needs trust. Trust needs verifiable infrastructure. And that is exactly the lane $ICP by @dfinity is building for. Not hype. Not another token narrative. Actual infrastructure. The market is still sleeping on this. But the architecture is already pointing in one direction: AI structured data verifiable compute = $ICP. ♾️ $ICP is not trying to be another blockchain. It is trying to become the compute layer for the next internet. @dfinity $ICP ♾️ If my posts help you see the bigger picture, contributions are appreciated: ICP address: 1e672d038cebc619d93186418fa98f6499dbdb9cfdfac54f366c61a4a4ee4362 #ICP #DFINITY #AI #Databricks #StructuredData #OnchainCloud #Web3 #Blockchain #InternetComputer #DeAI #SovereignCloud
3
11
276
Most teams call themselves data-driven. In reality, they’re working off the 10-20% that fits into tables. The rest sits in emails, PDFs, logs, chats. Our new guide by Vladyslav Savchenko for #StarWind explains structured vs unstructured vs semi-structured data and where each matters. Read more here: starwind.com/s/18x #StarWind_handy #DataEngineering #Analytics #StructuredData #DataStrategy #UnstructuredData
7
10
262
AI search visibility is a service your clients will be asking for by name within the year. Our newest Rundown includes a technical roadmap for building this into your existing offer. 🔗 Link in replies! #SEO #AISearch #StructuredData #SearchMarketing #GenerativeAI #DigitalMarketing #SEOAgency
2
3
521
Your website might look great to humans. But can ChatGPT read it? Most sites are JavaScript-heavy, image-based, zero structured data. AI systems can't parse them. They miss your services, location, reviews, and expertise entirely. That's why we built SEOAI Pages. Every business gets a public profile at seoai.space/p/your-business/ — built from the ground up for AI and traditional search engines. Auto-generated with every page: - Schema.org structured data (Organization, LocalBusiness, FAQ & more) - robots.txt whitelisting 11 AI crawlers - XML sitemap with IndexNow instant notification - Per-business llms.txt : first platform to do this per profile And what makes SEOAI truly different: Every page is blockchain-verified on Soneium L2. SHA-256 hash → recorded on-chain → tamper-proof → verifiable by anyone. Your page. Your data. Your proof. Live in seconds. 🔗 seoai.space #SEOAI #BlockchainVerification #Soneium #StructuredData #AIVisibility #llmstxt #Web3 #SEO
1
4
141