StarTree

StarTree

800 Photos and videos

Tweets

StarTree

@startreedata

Jun 3

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @wearemiq? MiQ is reinventing how programmatic advertising campaigns get built— using AI to unify audience discovery, segment selection, and campaign activation across fragmented DSP ecosystems. And it’s 𝗽𝗼𝘄𝗲𝗿𝗲𝗱 𝗯𝘆 𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁. Because in modern advertising, the bottleneck isn’t access to data. It’s the ability to search, compare, and activate audiences in real time across massive, fragmented systems. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 MiQ manages more than 100,000 audience segments across multiple providers and DSPs. But before Audiences: • Traders manually stitched together segment data across disconnected systems • Sales teams struggled to turn insights into activatable campaigns • Slow query performance created friction in high-speed workflows Traditional architectures couldn’t handle the concurrency and responsiveness required for real-time audience exploration 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 MiQ built a unified, AI-driven Audiences platform powered by Apache Pinot. • Streaming and indexed audience data now enables real-time search across massive segment inventories • Free-text audience discovery powered by vector indexing and LLM-generated metadata • Instant comparison of reach, CPM, and availability across DSPs • Direct activation workflows from discovery to execution This transforms campaign building from a fragmented manual process into an intelligent, interactive system. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 • Segment listing latency reduced from 8–10 seconds to ~2 seconds • Complex metric calculations accelerated by 40–80% • Query caching eliminated entirely • Multiple users can simultaneously explore audiences without performance degradation • AI-powered discovery improves campaign planning and audience selection 𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝗵𝗶𝗳𝘁 Programmatic advertising is moving from static workflows to intelligent systems that reason across fragmented data in real time. Because AI isn’t just changing ad targeting. It’s changing how campaigns themselves are constructed. 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗵𝗲𝗿𝗲 → stree.ai/4tvKdcW

StarTree

StarTree

@startreedata

May 27

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @Webex? Webex is building real-time observability for one of the world’s largest collaboration platforms—where engineers can detect audio degradation, latency spikes, and platform anomalies as they happen. 𝗔𝗻𝗱 𝗶𝘁’𝘀 𝗽𝗼𝘄𝗲𝗿𝗲𝗱 𝗯𝘆 𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁. Because at Webex scale, observability can’t rely on static metrics or delayed rollups. You need runtime analytics across billions of events, under concurrency, with fresh data arriving continuously. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 As remote work exploded, Webex had to support: • 100 TB of telemetry data per day • Over 300,000 messages per second at peak • More than a billion events daily • Hundreds of dimensions across audio quality, regions, clients, and user behavior The existing #Elasticsearch-based architecture struggled under the load: • Slow queries • Timeouts under concurrency • Heavy infrastructure costs • Rollups that limited visibility into emerging problems And in #observability, pre-aggregated data misses the very anomalies you’re trying to detect. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Webex rebuilt its observability platform around Apache Pinot. Streaming telemetry now powers: • Real-time runtime aggregations across raw event streams • Sub-second exploration of audio/video quality metrics • High-concurrency analytical queries across hundreds of dimensions • Live dashboards and alerting integrated with #Grafana and #Kibana This transforms observability from retrospective reporting into an interactive operational system. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 • 5× to 150× faster p99 query latency compared to Elasticsearch • Sub-second query performance in most workloads • Elasticsearch timed out in 67% of benchmark cases where Pinot succeeded • Cluster footprint reduced by 500 nodes • Data storage reduced from 800TB to 121TB of unique data 𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝗵𝗶𝗳𝘁 Modern observability systems can’t depend on pre-computed summaries anymore. Because when infrastructure behavior changes in seconds, the analytics layer must detect and explain anomalies as they emerge—not after the incident is over. 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗵𝗲𝗿𝗲 → stree.ai/4uDn9KR

StarTree

StarTree

@startreedata

May 20

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @stripe? Stripe is building real-time financial infrastructure that doesn’t just process payments—it explains what’s happening across billions of transactions as events unfold. And it’s 𝗽𝗼𝘄𝗲𝗿𝗲𝗱 𝗯𝘆 𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁. Because at Stripe’s scale, analytics can’t be an afterthought. Customer dashboards, fraud monitoring, billing analytics, and operational alerts all depend on fresh data under massive concurrency. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 Stripe processes more than 250 million API requests per day, peaking at 13,000 requests per second. That created a new requirement: • Real-time dashboards for merchants and developers • Instant visibility into payment processor failures • Live financial reporting and risk monitoring • Sub-second analytics across petabytes of transaction data Traditional architectures struggled to balance freshness, latency, and scale simultaneously. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Stripe standardized on Apache Pinot as its real-time analytical layer. Streaming data flows through Kafka and Flink into Pinot, where: Customer-facing dashboards update in near real time. Billing and API analytics stay interactive under heavy load. Internal teams monitor fraud, risk, and payment infrastructure live. Queries execute across massive transaction volumes with low tail latency. This transforms operational payment data into a system that can be interrogated continuously—not just reported on after the fact. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 • 10,000 queries per second • 70ms p99 query latency • 30-second p99 ingestion lag • 99.99% availability Over 1 petabyte of data managed across production Pinot clusters During Black Friday–Cyber Monday alone, Stripe used Pinot to track 300M transactions totaling more than $18.6B. 𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝗵𝗶𝗳𝘁 Modern financial platforms aren’t just transaction systems anymore. They’re real-time analytical systems operating under extreme concurrency. Because when money moves globally in milliseconds, the analytics layer has to move just as fast. 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗵𝗲𝗿𝗲 → stree.ai/4uB0UVs

203

StarTree

StarTree

@startreedata

May 14

𝗦𝗲𝗲 𝘆𝗼𝘂 𝗶𝗻 𝗧𝗼𝗿𝗼𝗻𝘁𝗼 𝗳𝗼𝗿 @confluentinc's 𝗔𝗜 𝗗𝗮𝘆 𝗮𝘁 𝗗𝗮𝘁𝗮 𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗪𝗼𝗿𝗹𝗱 𝗧𝗼𝘂𝗿. On Tuesday, May 26, meet the StarTree team at our booth or join Chad Meley for Real-Time Intelligence of Tokens at Scale. The session looks at what it takes to make streaming, high-cardinality token data queryable in seconds, so teams can inspect live workloads, understand model and user behavior, and troubleshoot AI systems with fresh data and low-latency queries. See you in Toronto. 𝗦𝗲𝗲 𝗲𝘃𝗲𝗻𝘁 𝗱𝗲𝘁𝗮𝗶𝗹𝘀 → stree.ai/4djPGxE #DSWT26 #DataStreamingWorldTour #ApachePinot #RealTimeAnalytics #DataEngineering #ApacheKafka

StarTree

StarTree

@startreedata

May 13

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @togethercompute? Together AI is building observability for the AI era— where infrastructure teams can understand not just how many tokens were consumed, but why workloads behave the way they do in real time. And it’s 𝗽𝗼𝘄𝗲𝗿𝗲𝗱 𝗯𝘆 𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁. Because in LLM infrastructure, dashboards aren’t enough. 𝗬𝗼𝘂 𝗻𝗲𝗲𝗱 𝗵𝗶𝗴𝗵-𝗰𝗮𝗿𝗱𝗶𝗻𝗮𝗹𝗶𝘁𝘆 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀 𝗮𝗰𝗿𝗼𝘀𝘀 𝗯𝗶𝗹𝗹𝗶𝗼𝗻𝘀 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀, 𝘂𝗻𝗱𝗲𝗿 𝗰𝗼𝗻𝗰𝘂𝗿𝗿𝗲𝗻𝗰𝘆, 𝘄𝗶𝘁𝗵 𝗳𝗿𝗲𝘀𝗵𝗻𝗲𝘀𝘀 𝗺𝗲𝗮𝘀𝘂𝗿𝗲𝗱 𝗶𝗻 𝘀𝗲𝗰𝗼𝗻𝗱𝘀—𝗻𝗼𝘁 𝗵𝗼𝘂𝗿𝘀. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 As token volumes surged into the billions per hour, Together AI hit a new problem: Traditional analytics systems weren’t designed for real-time LLM observability. Customers wanted live usage dashboards by prompt, model, and API key. Engineers needed to debug latency spikes and optimize GPU allocation in real time. Finance teams required precise token-level attribution for billing and cost management. 𝗕𝘂𝘁 𝗺𝗼𝘀𝘁 𝗼𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝘀𝘁𝗮𝗰𝗸𝘀 𝗳𝗼𝗿𝗰𝗲 𝗮 𝘁𝗿𝗮𝗱𝗲𝗼𝗳𝗳: 𝙀𝙞𝙩𝙝𝙚𝙧 𝙝𝙞𝙜𝙝 𝙛𝙧𝙚𝙨𝙝𝙣𝙚𝙨𝙨 𝙬𝙞𝙩𝙝 𝙡𝙤𝙬 𝙜𝙧𝙖𝙣𝙪𝙡𝙖𝙧𝙞𝙩𝙮—𝙤𝙧 𝙙𝙚𝙚𝙥 𝙖𝙣𝙖𝙡𝙮𝙨𝙞𝙨 𝙬𝙞𝙩𝙝 𝙨𝙡𝙤𝙬 𝙗𝙖𝙩𝙘𝙝 𝙥𝙞𝙥𝙚𝙡𝙞𝙣𝙚𝙨. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Together AI centralized streaming LLM telemetry into a real-time analytical layer using StarTree, powered by Apache Pinot. Streaming data flows into Pinot, where billions of token events become queryable in seconds. Usage can be sliced by model, user, API key, region, and prompt. Queries reconstruct infrastructure behavior as events unfold. Text indexing enables prompt-level debugging and anomaly detection. This transforms LLM telemetry from static batch reporting into an operational system for AI infrastructure. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 • Sub-second query latency across billions of token events • 10-second freshness windows for near real-time visibility • High-cardinality analytics at production scale • 50% storage cost reduction with tiered storage optimization • Latency improvements from 10 seconds to 7 milliseconds using Star-Tree indexing 𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝗵𝗶𝗳𝘁 LLM observability is becoming part of the product experience itself. Because when AI infrastructure becomes customer-facing, telemetry can’t arrive tomorrow. It has to explain what’s 𝙝𝙖𝙥𝙥𝙚𝙣𝙞𝙣𝙜 𝙣𝙤𝙬. 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗵𝗲𝗿𝗲 → stree.ai/4draymK #LLMobservability #RealTimeAnalytics #DataEngineering #ApachePinot

StarTree

StarTree

@startreedata

May 12

𝗦𝗮𝘃𝗲 𝘆𝗼𝘂𝗿 𝘀𝗽𝗼𝘁 𝗳𝗼𝗿 𝗼𝘂𝗿 𝗠𝗮𝘆 𝟮𝟬 𝘁𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗱𝗲𝗲𝗽 𝗱𝗶𝘃𝗲 𝗼𝗻 𝗜𝗰𝗲𝗯𝗲𝗿𝗴 𝗾𝘂𝗲𝗿𝘆 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲. For years, using Apache Iceberg meant accepting a tradeoff: open, flexible tables — but slower interactive query performance. We ran a benchmark across 𝗦𝘁𝗮𝗿𝗧𝗿𝗲𝗲, 𝗧𝗿𝗶𝗻𝗼, 𝗮𝗻𝗱 𝗖𝗹𝗶𝗰𝗸𝗛𝗼𝘂𝘀𝗲 on a 12.2B-row Iceberg dataset to test a different approach: bringing Apache Pinot-style indexing directly to Iceberg tables. The benchmark showed that when queries can skip to exactly the data they need instead of scanning pruned files, Iceberg can deliver sub-second performance. It also changes the cost equation, with less compute and I/O required per query. In the webinar, we’ll go deeper into the benchmark setup, query patterns, performance results, and what the findings mean for teams evaluating infrastructure efficiency and cost per query in Iceberg workloads. Join us on 𝗠𝗮𝘆 𝟮𝟬 𝗮𝘁 𝟭 𝗣𝗠 𝗘𝗗𝗧 𝗳𝗼𝗿 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝗶𝗻𝗴 𝗜𝗰𝗲𝗯𝗲𝗿𝗴 𝗤𝘂𝗲𝗿𝘆 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: 𝗦𝘁𝗮𝗿𝗧𝗿𝗲𝗲, 𝗧𝗿𝗶𝗻𝗼, 𝗮𝗻𝗱 𝗖𝗹𝗶𝗰𝗸𝗛𝗼𝘂𝘀𝗲 𝗖𝗼𝗺𝗽𝗮𝗿𝗲𝗱. 𝗦𝗮𝘃𝗲 𝗬𝗼𝘂𝗿 𝗦𝗽𝗼𝘁 → stree.ai/4uClCnO

Iceberg Query Performance at Scale: StarTree vs. Trino vs. ClickHouse Benchmark

A technical discussion of iceberg query performance benchmark results across 12.2B rows of Parquet data on S3 — including sub-second latency, CPU efficiency, caching behavior, and up to 15x lower...

startree.ai

StarTree

StarTree

@startreedata

Apr 29

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @AngelOne? Angel One is using real-time analytics to drive decisions inside the user journey itself—from personalized trading experiences to automated campaigns and self-healing onboarding flows. And it’s powered by #ApachePinot. Because in financial platforms, analytics isn’t just reporting—it directly impacts conversion, engagement, and revenue. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 Angel One operates across multiple business lines—equities, derivatives, loans, insurance—each with its own analytics needs. They needed a system that could: • Power user-facing experiences and internal dashboards simultaneously • Handle high ingestion rates and query concurrency • Support real-time decisions across different parts of the user lifecycle Traditional systems struggled to keep up with both scale and latency requirements. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Angel One standardized on Apache Pinot as a real-time serving layer across these workflows. This enabled: • Personalized trading experiences, adapting UI based on real-time trends and behavior • Automated campaign systems (GRIP), where decisions are made live based on performance thresholds • Onboarding analytics (PRISM), tracking funnel drop-offs and triggering automated recovery workflows These are not offline reports—they are decision systems operating in real time. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 • ~100k transactions per second ingested • 2M queries per day • <100ms p99 query latency for user-facing workloads This allows Angel One to: • Personalize user experiences dynamically • Optimize campaigns continuously, not retrospectively • Detect and resolve onboarding issues automatically 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗲𝗻𝗮𝗯𝗹𝗲𝘀 𝗻𝗲𝘅𝘁 With this foundation, Angel One is: • Expanding real-time analytics across all business verticals • Increasing automation in user lifecycle workflows • Continuing to contribute back to the Pinot ecosystem Because in modern financial platforms, it’s not enough to report on user behavior, you need to act on it in real time—while the user is still in the flow. 𝗥𝗲𝗮𝗱 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗮𝗻𝗱 𝘀𝗲𝗲 𝘁𝗵𝗲 𝘃𝗶𝗱𝗲𝗼 𝗵𝗲𝗿𝗲 → 𝗵𝘁𝘁𝗽𝘀://𝘀𝘁𝗿𝗲𝗲.𝗮𝗶/𝟯𝗢𝗤𝗕𝗔𝘃𝗶

0:26

127

StarTree

StarTree

@startreedata

Apr 22

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @Walmart? Walmart is building AI agents that don’t just answer “Where is my order?”— they can explain what’s happening, what went wrong, and what to do next in real time. And it’s powered by #ApachePinot. Because in last-mile delivery, visibility isn’t enough. You need analysis under concurrency, across systems, as events unfold. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 Walmart’s last-mile system spans 𝟮𝟬–𝟯𝟬 𝗺𝗶𝗰𝗿𝗼𝘀𝗲𝗿𝘃𝗶𝗰𝗲𝘀, each maintaining its own state of an order. This created a fragmented view: • No single system could explain the full lifecycle • Root cause analysis required stitching together events across services • Resolution depended on manual investigation When something broke, the question wasn’t just 𝘸𝘩𝘦𝘳𝘦 𝘪𝘴 𝘵𝘩𝘦 𝘰𝘳𝘥𝘦𝘳—it was: Which system failed, when, and why? 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Walmart centralized this into a real-time analytical layer using Apache Pinot. Streaming data from Kafka and Cosmos flows into Pinot, where: • Order events across all services are unified • Queries reconstruct lifecycle state in real time • Systems can analyze transitions, delays, and anomalies as they occur This turns operational data into something you can interrogate, not just observe. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 • 50% reduction in issue resolution time • Immediate identification of failure points across services • Automated remediation workflows via Airflow • Real-time operational metrics driving faster decisions Pinot becomes the system that answers not just what happened in the past, but what’s happening now and why. 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗮𝗻𝗱 𝘃𝗶𝗱𝗲𝗼 𝗵𝗲𝗿𝗲 → 𝗵𝘁𝘁𝗽𝘀://𝘀𝘁𝗿𝗲𝗲.𝗮𝗶/𝟰𝗰𝗺𝗔𝟭𝗢𝗝

0:29

139

StarTree

StarTree

@startreedata

Apr 16

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @SlackHQ? Slack is giving enterprise customers real-time visibility into data exfiltration—who is accessing messages and files, how much, and when. Not hours later. Not the next day. As it happens. And it’s powered by #ApachePinot. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 Slack’s customer-facing security analytics were historically batch-based: • Data flowed through Spark → S3 → Pinot • Visibility lagged by 24–48 hours • Customers couldn’t react to suspicious activity in time For security use cases, that gap is unacceptable. The core question wasn’t just 𝘸𝘩𝘢𝘵 𝘥𝘢𝘵𝘢 was accessed—it was: How much data is being exported right now, and is it anomalous? 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Slack moved to a real-time analytics architecture using Kafka Pinot. Instead of batch ingestion: • Events stream directly into #Kafka • Pinot consumes and indexes data in real time (<1s ingestion latency) • Queries compute metrics like distinct message/file access across apps These are not simple lookups—they are compute-intensive aggregations over large-scale, multi-value data. To support this, Slack leverages: • HyperLogLog (HLL) for approximate distinct counts • Range, sorted, and inverted indexes for efficient filtering and access 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 • <1 second data latency from event to queryable state • <10 second query latency for complex aggregations • 100% accuracy alignment with downstream Iceberg tables • Real-time visibility into data access patterns across external apps Customers can now detect and respond to potential data exfiltration as it happens, not after the fact. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗲𝗻𝗮𝗯𝗹𝗲𝘀 𝗻𝗲𝘅𝘁 With this foundation, Slack is: • Expanding real-time, customer-facing analytics use cases • Integrating Kafka Flink Pinot as a unified stack • Building systems that combine streaming computation with real-time serving Because in security systems, delayed insight isn’t just inconvenient—it's a risk. 𝗥𝗲𝗮𝗱 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗰𝗮𝘀𝗲 𝘀𝘁𝘂𝗱𝘆 𝗮𝗻𝗱 𝘀𝗲𝗲 𝘁𝗵𝗲 𝘃𝗶𝗱𝗲𝗼 𝗵𝗲𝗿𝗲 → stree.ai/4csWN6t

0:52

183

StarTree

StarTree

@startreedata

Mar 20

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @awscloud? AWS is showing how streaming data #vectorsearch #ApachePinot are powering a new generation of AI applications. Because in AI systems, context 𝘥𝘦𝘭𝘢𝘺𝘦𝘥 is 𝘷𝘢𝘭𝘶𝘦 𝘭𝘰𝘴𝘵. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 Modern AI applications depend on fast-changing signals: • Customer conversations • Product catalogs • Operational data • Market sentiment • Supply chain signals But most vector databases still update in batches. That means AI systems are often retrieving stale context, not what’s happening right now. And when context is stale, AI decisions lag behind reality. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 AWS demonstrated how real-time vector pipelines solve this problem. Streaming data flows through #Kafka or #Kinesis. Data is embedded with models like Amazon Titan. Those embeddings are ingested, indexed, and made available for vector search in Apache Pinot in real-time. The result is AI that retrieves live context, not yesterday’s embeddings. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 Applications that understand what’s happening right now: • Live deep learning recommendation engines • Customer support copilots with fresh context • Real-time sentiment analysis from social platforms Ultimately, faster responses to customer sentiment, market changes, and operational events. Because in the AI-native era: It’s not just what you know. It’s how fast you know it—and act on it. 𝗙𝘂𝗹𝗹 𝘀𝘁𝗼𝗿𝘆 → 𝗵𝘁𝘁𝗽𝘀://𝘀𝘁𝗿𝗲𝗲.𝗮𝗶/𝟰𝟬𝗘𝗙𝗲𝗲𝟰

0:49

116

StarTree

StarTree

@startreedata

Mar 13

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @𝗦𝘁𝗮𝗿𝗯𝘂𝗰𝗸𝘀? They’re combining Real-Time RAG with Apache Pinot to power smarter workforce decisions instantly. Because in retail operations, context delayed is value lost. 𝗧𝗵𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 Starbucks operates thousands of stores with constantly shifting signals: • Staffing levels • Store traffic • Training data • Operational KPIs • Regional trends 𝘔𝘢𝘯𝘢𝘨𝘦𝘳𝘴 𝘯𝘦𝘦𝘥 𝘢𝘯𝘴𝘸𝘦𝘳𝘴 𝘪𝘯 𝘵𝘩𝘦 𝘮𝘰𝘮𝘦𝘯𝘵, 𝘯𝘰𝘵 𝘴𝘵𝘢𝘵𝘪𝘤 𝘳𝘦𝘱𝘰𝘳𝘵𝘴. Traditional dashboards are too slow. Static RAG pulls stale context. Batch pipelines break the feedback loop. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Starbucks moved beyond static retrieval. By pairing Real-Time RAG with Apache Pinot, they created an AI system that retrieves live operational data, not yesterday’s snapshot. Pinot continuously indexes streaming workforce signals. RAG layers on top, grounding AI responses in real-time store operations context. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁 AI that reflects what is happening now, not what happened last night. 𝗪𝗵𝘆 𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 Real-time vector ingestion from streaming sources Sub-second queries on high-cardinality operational data Built for 1000s of manager-facing operational workloads 𝗧𝗵𝗲 𝗽𝗮𝘆𝗼𝗳𝗳 • Instant, context-aware workforce guidance • AI responses grounded in live operational data • Faster decisions at the store level • Human-centered AI, powered by real-time infrastructure 𝘛𝘩𝘪𝘴 𝘪𝘴 𝘸𝘩𝘢𝘵 𝘙𝘦𝘢𝘭-𝘛𝘪𝘮𝘦 𝘙𝘈𝘎 𝘭𝘰𝘰𝘬𝘴 𝘭𝘪𝘬𝘦 𝘪𝘯 𝘱𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯. 𝗙𝘂𝗹𝗹 𝘀𝘁𝗼𝗿𝘆 → 𝗵𝘁𝘁𝗽𝘀://𝘀𝘁𝗿𝗲𝗲.𝗮𝗶/𝟯𝗠𝗜𝟱𝗴𝗱𝗱

0:33

167

StarTree

StarTree

@startreedata

Feb 25

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @𝗨𝗯𝗲𝗿? They’re rethinking how time-series observability works at scale by building a dedicated query engine for Apache Pinot to handle real-time metrics with millisecond latency. 𝗧𝗵𝗲 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲 Uber’s internal observability platform monitors thousands of microservices. Charts, alerts, and dashboards rely on high-resolution metrics, real-time ingestion, and second-level freshness. 𝗧𝗵𝗲 𝗼𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸 Traditional SQL on columnar databases struggled with time-series use cases: • Manual bucketing logic (GROUP BY DATE_TRUNC) was brittle and error-prone • Ingestion gaps and mismatched time resolutions broke charts • LIMIT clauses truncated results unpredictably • Sparse data made comparisons (e.g. week-over-week) unreliable 𝗧𝗵𝗲 𝗺𝗼𝗺𝗲𝗻𝘁 𝗼𝗳 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 Instead of patching SQL with macros, Uber built a custom time-series query engine for Pinot. It’s already powering 100,000 alerts in production. 𝗪𝗵𝗮𝘁 𝗰𝗵𝗮𝗻𝗴𝗲𝗱? The engine introduces a domain-native query layer (e.g., M3QL, PromQL) on top of Pinot. Users can now write expressive queries like moving averages, gap fills, and time shifts. No schema migration. No refactoring. Just drop in and go. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 𝗽𝗮𝘆𝗼𝗳𝗳 With the new engine: • Engineers can use observability-native languages inside Pinot • Dashboards handle missing data and wide time windows cleanly • SQL remains available for ad hoc exploration and advanced use cases 𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝘁𝗲𝗮𝗺𝘀 This is observability at Pinot scale, without contorting SQL, breaking charts, or running two systems for metrics and analytics. 𝗠𝗼𝗿𝗲 𝗼𝗻 𝘁𝗵𝗲 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝗵𝗲𝗿𝗲: stree.ai/4tVbzuz

0:56

197

StarTree

StarTree

@startreedata

Feb 19

𝗪𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴 𝗻𝗼𝘄 𝗮𝘁 @𝟳𝘀𝗶𝗴𝗻𝗮𝗹? They’re delivering real-time Wi-Fi performance visibility across millions of devices by moving aggregation from write-time to query-time with Apache Pinot on StarTree Cloud. 𝗧𝗵𝗲 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲 7SIGNAL ingests ~35 million metrics/hour from enterprise Wi-Fi agents. Customers rely on sub-500ms dashboards to detect and troubleshoot network issues instantly. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 Originally, the team used Apache Flink to pre-aggregate into Postgres (15m/120m tumbling windows). While this ensured fast queries, it introduced a fixed lag: ➤ ~20–23 minute delay from event to dashboard ➤ Caused by waiting for windows to close late data buffers As real-time expectations grew, this model became a blocker. 𝗧𝗵𝗲 𝗺𝗼𝗺𝗲𝗻𝘁 𝗼𝗳 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 7SIGNAL realized their users didn’t want fast queries over stale data. They needed fresh insights and low latency simultaneously. So they re-architected the pipeline around Apache Pinot, delivered as a fully managed service via StarTree Cloud. 𝗧𝗵𝗲 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝘀𝗵𝗶𝗳𝘁 Agents → Kafka → Pinot No Flink. No Postgres. Raw metrics go directly into Pinot segments, available for immediate querying. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁 𝗽𝗮𝘆𝗼𝗳𝗳 With StarTree Cloud Pinot at the core: • Data freshness improved from ~𝟮𝟯 𝗺𝗶𝗻𝘂𝘁𝗲𝘀 𝘁𝗼 <𝟱 𝗺𝗶𝗻𝘂𝘁𝗲𝘀 • 𝗡𝗲𝘄 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗶𝗼𝗻𝘀 𝘄𝗲𝗻𝘁 𝗳𝗿𝗼𝗺 𝟰 𝗺𝗼𝗻𝘁𝗵𝘀 𝘁𝗼 𝟭 𝗺𝗼𝗻𝘁𝗵 of dev time • Query performance jumped from baseline to 𝟮–𝟭𝟬× 𝗳𝗮𝘀𝘁𝗲𝗿 • 𝗜𝗻𝗳𝗿𝗮 𝗰𝗼𝘀𝘁 𝗱𝗿𝗼𝗽𝗽𝗲𝗱 from 100% baseline to >𝟱𝟬% 𝘈𝘭𝘭 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘴𝘢𝘤𝘳𝘪𝘧𝘪𝘤𝘪𝘯𝘨 𝘴𝘶𝘣-𝘴𝘦𝘤𝘰𝘯𝘥 𝘥𝘢𝘴𝘩𝘣𝘰𝘢𝘳𝘥𝘴. 𝗧𝗵𝗲 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝘂𝗻𝗹𝗼𝗰𝗸 Pinot gave the performance. StarTree Cloud removed the ops tax: No more managing brokers, minions, or deep storage. The team now focuses on product, not pipelines. See how they did it: stree.ai/4c4Eq9s

0:39

145

StarTree

StarTree

@startreedata

Feb 10

The Apache Pinot meetup, 𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁: 𝗪𝗵𝗮𝘁’𝘀 𝗔𝗱𝘃𝗮𝗻𝗰𝗶𝗻𝗴 𝗔𝗰𝗿𝗼𝘀𝘀 𝗤𝘂𝗲𝗿𝘆, 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝘁𝗵𝗲 𝗖𝗼𝗿𝗲 𝗘𝗻𝗴𝗶𝗻𝗲, is happening tomorrow at 8:30 AM PST. 👉 stree.ai/3Zu5NC9 Join engineers from @LinkedIn, @Uber, @Walmart, @SlackHQ, @AngelOne, @startreedata, and others for an open discussion on recent work to Pinot's core, time-series query engine, real-time ingestion, and more.

Apache Pinot: What’s Advancing Across Query, Ingestion, and the Core Engine, Wed, Feb 11, 2026,...

Join the [Apache Pinot™ community](https://pinot.apache.org/) for a conversation on how teams are using Pinot to power real-time analytics today. In this meetup, we’ll bri

meetup.com

189

StarTree

StarTree

@startreedata

Feb 6

𝗔𝗽𝗮𝗰𝗵𝗲 𝗣𝗶𝗻𝗼𝘁: 𝗪𝗵𝗮𝘁’𝘀 𝗔𝗱𝘃𝗮𝗻𝗰𝗶𝗻𝗴 𝗔𝗰𝗿𝗼𝘀𝘀 𝗤𝘂𝗲𝗿𝘆, 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝘁𝗵𝗲 𝗖𝗼𝗿𝗲 𝗘𝗻𝗴𝗶𝗻𝗲 🗓 Feb 11 · 8:30am PST · Apache PInot Community Meetup If you work with Apache Pinot™, this meetup is all about 𝗰𝗼𝗺𝗽𝗮𝗿𝗶𝗻𝗴 𝗻𝗼𝘁𝗲𝘀 𝗼𝗻 𝗵𝗼𝘄 𝘁𝗵𝗲 𝗲𝗻𝗴𝗶𝗻𝗲 𝗶𝘀 𝗲𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝘁𝗼𝗱𝗮𝘆 — across query execution, ingestion pipelines, and beyond to power real-time analytics today. 👉 Register: stree.ai/4ahEUGG #ApachePinot #DataEngineering #RealTimeAnalytics

StarTree

StarTree

@startreedata

Feb 4

𝗪𝗵𝗮𝘁’𝘀 𝙝𝙖𝙥𝙥𝙚𝙣𝙞𝙣𝙜 𝙣𝙤𝙬 𝗮𝘁 @InsideGrab? Modernized observability: metrics defined once, served everywhere. APIs generate real-time Pinot queries—10M requests/month, ~1s end-to-end—powering ops and ML with a single source of truth. stree.ai/4qBumcn

1:12

StarTree

StarTree

@startreedata

Jan 28

𝗪𝗵𝗮𝘁’𝘀 𝙝𝙖𝙥𝙥𝙚𝙣𝙞𝙣𝙜 𝙣𝙤𝙬 𝗮𝘁 @Life360. They rebuilt analytics for now: ~700K location events/sec, <90ms geospatial queries, upserts tracking latest location per user. Real-time safety at global scale—analytics as core infrastructure, not reporting. stree.ai/4qxxINi

0:37

StarTree

StarTree

@startreedata

Jan 21

𝗪𝗵𝗮𝘁’𝘀 𝙝𝙖𝙥𝙥𝙚𝙣𝙞𝙣𝙜 𝙣𝙤𝙬 𝗮𝘁 @Uber? They design around the time value of data. Seconds-fresh streams real-time analytics power matching, pricing, ETAs at peak scale: ~1M concurrent trips, ~200M Pinot queries/day, trillions of Kafka events. Built for motion, not batch. stree.ai/4sPsCh9

0:49

StarTree

StarTree

@startreedata

Jan 15

𝗪𝗵𝗮𝘁’𝘀 𝙝𝙖𝙥𝙥𝙚𝙣𝙞𝙣𝙜 𝙣𝙤𝙬 𝗮𝘁 @CrowdStrike? Real-time threat detection at scale. They use #ApachePinot to monitor Kafka firehoses live—120K events/sec, 25K QPS—triggering signals that throttle services before SLOs break. Analytics as a traffic cop, not a dashboard. stree.ai/4pHhAHV

0:40

129

StarTree

StarTree

@startreedata

Jan 6

Builders, engineers, community, welcome 👋 If you’re running, or wanting to learn more, on Apache Pinot, join us in two days to: • Review 2025 releases • Share feedback • Look ahead to 2026 🗓 Jan 8 | 🕘 9 AM PST 👉 stree.ai/4qmjXk7 #ApachePinot #OSS #RealTimeAnalytics

134