๐ช๐ต๐ฎ๐โ๐ ๐ต๐ฎ๐ฝ๐ฝ๐ฒ๐ป๐ถ๐ป๐ด ๐ป๐ผ๐ ๐ฎ๐
@Webex?
Webex is building real-time observability for one of the worldโs largest collaboration platformsโwhere engineers can detect audio degradation, latency spikes, and platform anomalies as they happen.
๐๐ป๐ฑ ๐ถ๐โ๐ ๐ฝ๐ผ๐๐ฒ๐ฟ๐ฒ๐ฑ ๐ฏ๐ ๐๐ฝ๐ฎ๐ฐ๐ต๐ฒ ๐ฃ๐ถ๐ป๐ผ๐.
Because at Webex scale, observability canโt rely on static metrics or delayed rollups.
You need runtime analytics across billions of events, under concurrency, with fresh data arriving continuously.
๐ง๐ต๐ฒ ๐ฐ๐ต๐ฎ๐น๐น๐ฒ๐ป๐ด๐ฒ
As remote work exploded, Webex had to support:
ย ย โข 100 TB of telemetry data per day
ย ย โข Over 300,000 messages per second at peak
ย ย โข More than a billion events daily
ย ย โข Hundreds of dimensions across audio quality, regions, clients, and user behavior
The existing
#Elasticsearch-based architecture struggled under the load:
ย ย โข Slow queries
ย ย โข Timeouts under concurrency
ย ย โข Heavy infrastructure costs
ย ย โข Rollups that limited visibility into emerging problems
And in
#observability, pre-aggregated data misses the very anomalies youโre trying to detect.
๐ง๐ต๐ฒ ๐ถ๐ป๐๐ถ๐ด๐ต๐
Webex rebuilt its observability platform around Apache Pinot.
Streaming telemetry now powers:
ย ย โข Real-time runtime aggregations across raw event streams
ย ย โข Sub-second exploration of audio/video quality metrics
ย ย โข High-concurrency analytical queries across hundreds of dimensions
ย ย โข Live dashboards and alerting integrated with
#Grafana and
#Kibana
This transforms observability from retrospective reporting into an interactive operational system.
๐ง๐ต๐ฒ ๐ฟ๐ฒ๐๐๐น๐
ย ย โข 5ร to 150ร faster p99 query latency compared to Elasticsearch
ย ย โข Sub-second query performance in most workloads
ย ย โข Elasticsearch timed out in 67% of benchmark cases where Pinot succeeded
ย ย โข Cluster footprint reduced by 500 nodes
ย ย โข Data storage reduced from 800TB to 121TB of unique data
๐ง๐ต๐ฒ ๐ฏ๐ถ๐ด๐ด๐ฒ๐ฟ ๐๐ต๐ถ๐ณ๐
Modern observability systems canโt depend on pre-computed summaries anymore.
Because when infrastructure behavior changes in seconds, the analytics layer must detect and explain anomalies as they emergeโnot after the incident is over.
๐๐ต๐ฒ๐ฐ๐ธ ๐ผ๐๐ ๐๐ต๐ฒ ๐ณ๐๐น๐น ๐ฐ๐ฎ๐๐ฒ ๐๐๐๐ฑ๐ ๐ต๐ฒ๐ฟ๐ฒ โ
stree.ai/4uDn9KR