Senior Data Engineer · AWS | PySpark | Databricks

Joined August 2025
6 Photos and videos
Pinned Tweet
Just reactivated this account. Senior Data Engineer. Building in public — AWS, data pipelines, and the road to Europe. Follow along 🚀 #DataEngineering #AWS
1
2
61
Early in my career, I got an IAM error. AccessDenied: User is not authorized to perform: s3:GetObject I panicked. Added "Action": "*" to the policy. It worked. That single move is one of the most common — and dangerous — mistakes data engineers make. IAM isn’t “someone else’s job.” Every Glue job, every Lambda function, every Redshift cluster — needs permissions to talk to other AWS services. If you’re writing pipelines, you’re writing IAM policies. Whether you realize it or not. 5 things every data engineer must know about IAM: 1. Users vs Roles — know the difference 👤 Users → for people (you logging into AWS console) 🤖 Roles → for services (Glue, Lambda, Redshift assuming permissions) Your pipelines should never use a user’s credentials. Always roles. 2. Two policies, two jobs 📜 Trust policy → “WHO can assume this role?” 📜 Permission policy → “WHAT can this role do?” A Glue job’s role trusts glue.amazonaws.com AND has permissions to read S3 write to Glue Catalog. 3. Least privilege isn’t optional "Action": "s3:*" on "Resource": "*" = an open door. Scope it down: "Action": ["s3:GetObject", "s3:PutObject"] "Resource": "arn:aws:s3:::my-bucket/raw/*" 4. Never hardcode credentials No access keys in your PySpark scripts. No .env files committed to Git. Glue, Lambda, EC2 — all can assume roles automatically. Use that. 5. One role per job type Don’t create one giant “DataEngineerRole” used everywhere. A Glue ETL role ≠ a Lambda trigger role ≠ a Redshift COPY role. Separate roles = smaller blast radius if something goes wrong. That "Action": "*" shortcut I used early on? In a real production environment, that’s the kind of mistake that shows up in a security audit — and in an interview question. Full breakdown with real policy JSON common mistakes → medium.com/@mallinathnpatil1… #DataEngineering #AWS #IAM #CloudSecurity #Python #DataEngineer
29
🚨 The recent restrictions on advanced AI models for foreign nationals raise an important question: Why does India need its own AI models? For years, we have relied on global technology platforms for cloud computing, software, and now AI. But recent events remind us that access to critical technology can change overnight due to regulations, geopolitics, or national security concerns. This isn’t just about one company or one AI model. It’s about technological self-reliance. India needs its own AI ecosystem because: ✅ Control over critical technology ✅ Better support for Indian languages ✅ AI solutions tailored for Indian businesses ✅ Reduced dependence on foreign platforms ✅ Stronger innovation and research ecosystem The goal isn’t to replace global AI models. The goal is to ensure that Indian developers, researchers, startups, and enterprises always have access to world-class AI capabilities. Just as India invested in digital payments, space technology, and semiconductor manufacturing, AI may become another strategic area for long-term national growth. The future will belong to nations that not only use AI but also build it. #AI #IndiaAI #ArtificialIntelligence #Technology #Innovation #DataEngineering #DigitalIndia
1
13
One of the biggest lessons I learned in Data Engineering: Making a pipeline work is only half the job. Making it efficient is where real engineering begins. A Spark job that runs for 4 hours and costs thousands of dollars may produce the same output as a job that runs in 30 minutes. The difference is optimization. Optimization isn’t just about speed. It’s about: ✅ Reducing cloud costs ✅ Meeting SLA timelines ✅ Improving resource utilization ✅ Supporting larger data volumes ✅ Creating reliable production systems Simple techniques like: • Proper partitioning • Avoiding unnecessary shuffles • Filtering data early • Choosing the right join strategy • Caching wisely can make a huge difference. As data grows, poorly optimized pipelines become expensive and difficult to maintain. That’s why senior Data Engineers don’t just ask: “Does the code work?” They ask: “Will it still work efficiently when the data grows 10x?” In today’s cloud world, optimization is no longer a nice-to-have skill. It’s a business skill. #DataEngineering #PySpark #ApacheSpark #Databricks #AWS #BigData #Optimization
1
26
Aws vs Azure
1
2
22
I'm looking to #CONNECT with people interested in: - Frontend - Backend - Full stack - DevOps - AI/ML - Data Engineer - UI/UX - Freelancing - Startup - Saas Say hi & Let's grow together #BuildingInPublic
26
13
468
I used to think cloud was just "someone else's computer." Then I joined a project processing 18M events/day. Everything changed. 🧵
1
1
18
Here's what we'll cover: 🟡 Phase 1 — Foundation S3, IAM, Glue Catalog, Athena 🟠 Phase 2 — Core Pipelines Glue ETL, Lambda, Kinesis, Redshift 🔴 Phase 3 — Mid-Level Depth Medallion architecture, cost optimization, security, file formats
1
1
11
Full Part 1 is live on LinkedIn. Deep dive on Medium. Links in bio 🔗 Follow @ai_dataengineer — posting daily on data engineering, AWS, and building in public. #DataEngineering #AWS #Python
9
DataEngineer.ai retweeted
The new OpenCreator is here. Meet OpenCreator Agent —built to help you create complete videos through conversation. Start with an idea. OpenCreator Agent helps you shape the story, build the storyboard, create every shot, refine the details, and bring everything together into the final cut. Just describe what you want to create. More creating. Less managing the process. OpenCreator, your everyday video studio.
155
121
364
420,743
I’m back on X. 🚀 Over the next few months, I’ll be sharing content on: • Data Engineering • PySpark & Databricks • AWS for Data Engineers • Production Support & Real-World Scenarios • Interview Preparation • Career Growth in Tech Building in public. Learning in public. Let’s grow together. #DataEngineering #AWS #Databricks
7
Cloud skills that will keep you relevant in 2026: • AWS/Azure/GCP • Data pipelines • Spark & Big Data • Docker & Kubernetes • Infrastructure automation • Scalability concepts
8
1 skill every IT employee should learn in 2026: Adaptability. Because AI will change workflows. Cloud will keep evolving. Job roles will continue changing. People who learn fast and adapt quickly will always have an edge. In tech now: Learning speed > Comfort zone.
9
2026 reality check for IT employees: The safest people in tech are not the smartest ones. They are the fastest learners. A few years ago: → Basic coding = enough Now: → Cloud Big Data AI awareness = expected
1
10
If you’re still depending only on one skill, this is the best time to upgrade. Learn: • AWS/Azure/GCP • Spark & Big Data • Data pipelines • AI tools • Automation The industry is rewarding adaptable engineers more than experienced engineers.
8
The brain gets more dopamine from planning than doing. That’s why people have multiple business ideas… but zero businesses.
7
DBT vs ETL
1
11