Understand Airflow For Data Engineering (Quick Guide📝)
📌 What is Apache Airflow?
As a Data Engineer, one of the tasks you perform is to build a data pipeline now for that you can write simple Python scripts or use enterprise tools.
Simple Python script with cron job is enough for a few pipelines but what if you have 100s of them? Hard to manage!!!
This is where Airflow Comes into the picture 🚀
💡 It's an open-source tool for managing data pipelines, you can build, schedule, and monitor workflows.
In one place you can manage everything!
You can watch the detailed video here:
There are a few components you need to understand.
📌 DAG (Directed Acyclic Graph):
At the heart of Airflow is the DAG, which defines a collection of tasks and their dependencies in a specific order.
This is a core computer science concept.
Think of it as a blueprint of your workflow, ensuring that tasks run in the sequence.
👉🏻 Directed: Tasks move in a certain direction.
👉🏻 Acyclic: No loops! Tasks don't run in circles.
👉🏻 Graph: A visual representation of the tasks.
📌 What is a Task?
It is basically where you write your logic, such as reading data, transforming it, or writing it. Each task runs independently, in its own process.
To create a task we need to use Operators.
📌 What are Operators?
There are many different operators you can use for a specific task.
They determine WHAT gets done.
👉🏻 BashOperator: Executes a bash command.
👉🏻 PythonOperator: Executes a Python function.
👉🏻 PostgresOperator: Executes SQL on a Postgres database.
and many more!
📌 Executor:
Determines HOW tasks are run.
There are several types:
👉🏻 SequentialExecutor: Runs tasks sequentially.
👉🏻 LocalExecutor: Runs tasks in parallel on a single machine.
👉🏻 CeleryExecutor: Distributes tasks across multiple machines.
📌 Scheduler: The brain behind when your tasks run. It checks the DAGs to see if they have tasks to run and sends them to the executor.
📌 Web Server: A friendly UI to monitor and manage your DAGs. You can check task logs, rerun tasks, and visualize task dependencies.
Let me know if you found this helpful 👇🏻