Data Engineering Weekly

Data Engineering Weekly

31 Photos and videos

Tweets

Data Engineering Weekly @data_weekly

Jun 12

1/ How do you leverage long-term, messy user history in high-traffic search systems without ruining millisecond-level latency? @IndeedEng tackled this by building a User Behavior Modeling (UBM) system that distills long-tail behaviors into scalable embeddings. 🧵👇

279

more replies

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 12

7/ Keeping it fresh: Instead of retraining the whole model daily, Indeed runs daily batch inference with a sliding window of the latest user histories. The updated dense user embeddings are written directly to a feature store for real-time production use. 🎯

103

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 12

Read the full data engineering breakdowns here:dataengineeringweekly.com/i/…

Data Engineering Weekly #273

The Weekly Data Engineering Newsletter

dataengineeringweekly.com

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 12

How do you scale data pipelines when your custom-built scheduler hits its limits? In a recent piece for Data Engineering Weekly, Poorva Patil shares how Helpshift migrated from an evolving, complex monolithic orchestrator to Apache Airflow. 🧵👇 #DataEngineering #ApacheAirflow

360

more replies

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 12

Key Results & Wins: ✅ Greatly simplified workflow & dependency management ✅ High-level observability into pipeline failures ✅ Drastic reduction in cloud costs through transient resource scheduling ✅ Cleaner, developer-friendly DAG architecture 🛠️

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 12

Migrations are never simple, but shifting from a monolithic bottleneck to code-defined orchestration with Airflow unlocked the scale Helpshift needed. Read the full engineering breakdown here: dataengineeringweekly.com/i/…

Data Engineering Weekly #273

The Weekly Data Engineering Newsletter

dataengineeringweekly.com

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 10

1/7 🚨 New Post: When Cloudflare’s petabyte-scale ClickHouse cluster stalled—putting critical daily billing pipelines at risk—standard infrastructure metrics (I/O, CPU, memory, rows scanned) showed absolutely nothing wrong. Here is how they found and fixed a hidden bottleneck 👇

500

more replies

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 10

7/7 💡 Key Takeaway: When scaling data systems, bottlenecks like lock contention and memory copying can hide behind healthy execution metrics. True to open-source engineering, Cloudflare contributed these optimizations upstream to ClickHouse (v25.11)!

Data Engineering Weekly

Data Engineering Weekly @data_weekly

Jun 10

Read the full deep dive here: dataengineeringweekly.com/i/…

Data Engineering Weekly #273

The Weekly Data Engineering Newsletter

dataengineeringweekly.com