Iβm learning Data Engineering from scratch as a Data Analyst.
So I mapped out a 12-month self-study roadmap to guide the transition.
I just published it here @TDataScience
You can read it here
towardsdatascience.com/from-β¦
I built my first ETL pipeline as a complete beginner.
No Airflow.
No Spark.
No cloud infrastructure.
Just Python, pandas, and a GitHub API.
Recently shared the full story in @TDataScience.
Here's exactly what I learned π§΅π
12/ If you're trying to break into data engineering:
Stop waiting for the perfect roadmap.
Pick a small project.
Build something ugly.
Finish it.
Your first pipeline won't be impressive.
But it will be the project that teaches you the most.
Read the full article below π
Data Engineering work is mostly ETL (Extract, Transform, Load).
So to learn about ETL, I decided to build a basic ETL pipeline that extracts data from GitHub repositories and saves it as CSV
Read about the entire process @TDataScience
Using the GitHub API, @ibbysalam shows us the steps he took to build an extract, transform, load data pipeline from scratch βΒ and as a complete beginner. towardsdatascience.com/i-buiβ¦
Happy to see a lot of people are resonating with this article. Can't wait to see where this journey takes me. If you're looking to break into or transition to data engineering. This is a good read
If you're considering a role change, or curious about the path to become a data engineer, don't miss @ibbysalam's new series on his own journey, covering the tools and resources he'll rely on and the many twists and turns he's bound to face. towardsdatascience.com/from-β¦
"It is tempting to treat hybrid search as something you can tune once: pick a merge algorithm, choose a lexical/semantic weight, and ship it - but there is no globally correct merge strategy"
hornet.dev/blog/100m-doc-seaβ¦