Joined May 2018
82 Photos and videos
Top 5 PySpark mistakes I still see: Using count() just to check data Chaining withColumn() 20 times Ignoring join cardinality Reaching for UDFs too quickly Repartitioning without a reason Your cluster remembers every one of them.
1
56
Why they hurt 👇 • "count()" → triggers a full Spark job • Many "withColumn()" calls → larger execution plans • Wrong join cardinality → duplicate rows & data skew • UDFs → bypass Catalyst optimizations • Unnecessary "repartition()" → expensive network shuffle
10
5 SQL optimization tips every Data Engineer should know: 1️⃣ Filter early → scan less data 2️⃣ Avoid SELECT → read only needed columns 3️⃣ Reduce data before joins → less shuffle, faster execution 4️⃣ Prefer EXISTS over IN for large subqueries 5️⃣ Identify & fix data skew before .
2
102
Iceberg vs Delta Lake: 🧊 Iceberg wins → Openness, interoperability, multi-engine analytics. ⚡ Delta wins → Performance, simplicity, Databricks integration. Iceberg's weakness: More moving parts. Delta's weakness: Less platform-agnostic.
1
42
Databricks Genie Benchmarks now support Agent Mode 👀 Meaning benchmarks can now evaluate the same multi-step reasoning flow used during real AI conversations not just text-to-SQL accuracy. #Databricks #DataEngineering #AIEngineering
2
327
Most companies don’t need bigger data warehouses. They need smarter ones. Serverless DW: • Auto scaling • No infra management • Pay per usage Provisioned DW: • More control • Dedicated compute • Better for steady workloads Serverless or Provisioned — what’s your pick?
1
208
A poorly optimized Spark job can cost more in one week than a junior engineer’s monthly salary. That’s why modern Data Engineering is becoming as much about cost optimization as data processing.
3
1,062
Data engineering fact: Most “AI magic” fails without solid data pipelines underneath. Bad data quality can silently destroy model accuracy faster than bad algorithms ever will. In tech, clean data > fancy dashboards.
3
541
Data Engineering in 2026 is shifting fast: Lakehouses Apache Iceberg are replacing traditional warehouses, AI is writing data quality checks & Text-to-SQL is getting insanely good. Future DEs won’t just build pipelines they’ll engineer intelligent data platforms .
1
7
639
Wind Energy GM
1
4
25
Lions don’t fear other lions. They build with them, learn from them, and grow stronger together. Hyenas prefer rabbits. Not because rabbits are valuable, but their weakness create the illusion of power. Secure people are inspired by strength. Insecure people are threatened by it
2
33
Some people make you dream bigger after one conversation. Some people drain your ambition in 5 min . need to choose wisely.
1
2
26
Good night everyone . Hope you had a productive day
1
3
17
Back on X . This month I was going through a lot . I hope I will get back much stronger .
3
36
Having sharp intuition is a curse because you see through lies , the games and the patterns .
1
3
30
He who waits for the perfect shot dies with a full mag. but the sniper who shoots aimlessly gives away their position
4
36
If you're not happy single, you won't be happy in a relationship either. True happiness comes from building dashboards that give executives deeper insight into critical business functions.
2
199
Pull request - कोड मर्ज करने का अनुरोध kare.
What if India built GitHub?
1
56
Sumit retweeted
What if India built GitHub?
What if the EU built GitHub?
582
1,831
18,619
949,488
Good night everyone @X
1
3
20