The "Tools don't matter" argument falls flat in Data Engineering and ML.
- Working with large datasets on a Spark cluster can quickly shift your focus from business problems to infrastructure challenges.
- Using Pandas for 10GB data demands huge RAM, but switching to
@duckdb or Polars can handle it with much less.
- Orchestrating 50 jobs with Airflow is great, but for a few offline tasks, cron can do the job without the overhead.
- Assuming all SQL is the same was a major mistake for me: different databases aren't always interchangeable, and I now appreciate
@IbisData .
Don't underestimate the learning curve of new technologies. Choosing the right tools requires careful consideration of your team's stack and needs. Let the team explore before deciding, but never assume switching tools is easy—it’s not.