Just reactivated this account. Senior Data Engineer. Building in public — AWS, data pipelines, and the road to Europe. Follow along 🚀 #DataEngineering#AWS
Early in my career, I got an IAM error.
AccessDenied: User is not authorized to perform: s3:GetObject
I panicked. Added "Action": "*" to the policy.
It worked.
That single move is one of the most common — and dangerous — mistakes data engineers make.
IAM isn’t “someone else’s job.”
Every Glue job, every Lambda function, every Redshift cluster — needs permissions to talk to other AWS services.
If you’re writing pipelines, you’re writing IAM policies. Whether you realize it or not.
5 things every data engineer must know about IAM:
1. Users vs Roles — know the difference
👤 Users → for people (you logging into AWS console)
🤖 Roles → for services (Glue, Lambda, Redshift assuming permissions)
Your pipelines should never use a user’s credentials. Always roles.
2. Two policies, two jobs
📜 Trust policy → “WHO can assume this role?”
📜 Permission policy → “WHAT can this role do?”
A Glue job’s role trusts glue.amazonaws.com AND has permissions to read S3 write to Glue Catalog.
3. Least privilege isn’t optional
"Action": "s3:*" on "Resource": "*" = an open door.
Scope it down:
"Action": ["s3:GetObject", "s3:PutObject"]
"Resource": "arn:aws:s3:::my-bucket/raw/*"
4. Never hardcode credentials
No access keys in your PySpark scripts. No .env files committed to Git.
Glue, Lambda, EC2 — all can assume roles automatically. Use that.
5. One role per job type
Don’t create one giant “DataEngineerRole” used everywhere.
A Glue ETL role ≠ a Lambda trigger role ≠ a Redshift COPY role.
Separate roles = smaller blast radius if something goes wrong.
That "Action": "*" shortcut I used early on?
In a real production environment, that’s the kind of mistake that shows up in a security audit — and in an interview question.
Full breakdown with real policy JSON common mistakes → medium.com/@mallinathnpatil1…#DataEngineering#AWS#IAM#CloudSecurity#Python#DataEngineer
🚨 The recent restrictions on advanced AI models for foreign nationals raise an important question:
Why does India need its own AI models?
For years, we have relied on global technology platforms for cloud computing, software, and now AI.
But recent events remind us that access to critical technology can change overnight due to regulations, geopolitics, or national security concerns.
This isn’t just about one company or one AI model.
It’s about technological self-reliance.
India needs its own AI ecosystem because:
✅ Control over critical technology
✅ Better support for Indian languages
✅ AI solutions tailored for Indian businesses
✅ Reduced dependence on foreign platforms
✅ Stronger innovation and research ecosystem
The goal isn’t to replace global AI models.
The goal is to ensure that Indian developers, researchers, startups, and enterprises always have access to world-class AI capabilities.
Just as India invested in digital payments, space technology, and semiconductor manufacturing, AI may become another strategic area for long-term national growth.
The future will belong to nations that not only use AI but also build it.
#AI#IndiaAI#ArtificialIntelligence#Technology#Innovation#DataEngineering#DigitalIndia
One of the biggest lessons I learned in Data Engineering:
Making a pipeline work is only half the job.
Making it efficient is where real engineering begins.
A Spark job that runs for 4 hours and costs thousands of dollars may produce the same output as a job that runs in 30 minutes.
The difference is optimization.
Optimization isn’t just about speed.
It’s about:
✅ Reducing cloud costs
✅ Meeting SLA timelines
✅ Improving resource utilization
✅ Supporting larger data volumes
✅ Creating reliable production systems
Simple techniques like:
• Proper partitioning
• Avoiding unnecessary shuffles
• Filtering data early
• Choosing the right join strategy
• Caching wisely
can make a huge difference.
As data grows, poorly optimized pipelines become expensive and difficult to maintain.
That’s why senior Data Engineers don’t just ask:
“Does the code work?”
They ask:
“Will it still work efficiently when the data grows 10x?”
In today’s cloud world, optimization is no longer a nice-to-have skill.
It’s a business skill.
#DataEngineering#PySpark#ApacheSpark#Databricks#AWS#BigData#Optimization
I'm looking to #CONNECT with people interested in:
- Frontend
- Backend
- Full stack
- DevOps
- AI/ML
- Data Engineer
- UI/UX
- Freelancing
- Startup
- Saas
Say hi & Let's grow together
#BuildingInPublic
Full Part 1 is live on LinkedIn.
Deep dive on Medium.
Links in bio 🔗
Follow @ai_dataengineer —
posting daily on data engineering,
AWS, and building in public.
#DataEngineering#AWS#Python
The new OpenCreator is here.
Meet OpenCreator Agent
—built to help you create complete videos through conversation.
Start with an idea.
OpenCreator Agent helps you shape the story,
build the storyboard, create every shot, refine the details,
and bring everything together into the final cut.
Just describe what you want to create.
More creating. Less managing the process.
OpenCreator, your everyday video studio.
I’m back on X. 🚀
Over the next few months, I’ll be sharing content on:
• Data Engineering
• PySpark & Databricks
• AWS for Data Engineers
• Production Support & Real-World Scenarios
• Interview Preparation
• Career Growth in Tech
Building in public. Learning in public.
Let’s grow together.
#DataEngineering#AWS#Databricks
Cloud skills that will keep you relevant in 2026:
• AWS/Azure/GCP
• Data pipelines
• Spark & Big Data
• Docker & Kubernetes
• Infrastructure automation
• Scalability concepts
1 skill every IT employee should learn in 2026:
Adaptability.
Because AI will change workflows.
Cloud will keep evolving.
Job roles will continue changing.
People who learn fast and adapt quickly will always have an edge.
In tech now:
Learning speed > Comfort zone.
2026 reality check for IT employees:
The safest people in tech are not the smartest ones.
They are the fastest learners.
A few years ago:
→ Basic coding = enough
Now:
→ Cloud Big Data AI awareness = expected
If you’re still depending only on one skill, this is the best time to upgrade.
Learn:
• AWS/Azure/GCP
• Spark & Big Data
• Data pipelines
• AI tools
• Automation
The industry is rewarding adaptable engineers more than experienced engineers.