1/ How do you leverage long-term, messy user history in high-traffic search systems without ruining millisecond-level latency?
@IndeedEng tackled this by building a User Behavior Modeling (UBM) system that distills long-tail behaviors into scalable embeddings. 🧵👇
7/ Keeping it fresh: Instead of retraining the whole model daily, Indeed runs daily batch inference with a sliding window of the latest user histories.
The updated dense user embeddings are written directly to a feature store for real-time production use. 🎯
How do you scale data pipelines when your custom-built scheduler hits its limits?
In a recent piece for Data Engineering Weekly, Poorva Patil shares how Helpshift migrated from an evolving, complex monolithic orchestrator to Apache Airflow. 🧵👇
#DataEngineering#ApacheAirflow
Migrations are never simple, but shifting from a monolithic bottleneck to code-defined orchestration with Airflow unlocked the scale Helpshift needed.
Read the full engineering breakdown here: dataengineeringweekly.com/i/…
1/7 🚨 New Post: When Cloudflare’s petabyte-scale ClickHouse cluster stalled—putting critical daily billing pipelines at risk—standard infrastructure metrics (I/O, CPU, memory, rows scanned) showed absolutely nothing wrong.
Here is how they found and fixed a hidden bottleneck 👇
7/7 💡 Key Takeaway: When scaling data systems, bottlenecks like lock contention and memory copying can hide behind healthy execution metrics. True to open-source engineering, Cloudflare contributed these optimizations upstream to ClickHouse (v25.11)!