All Resources

Apache Airflow

Released: 
June 3, 2015
Documentation
Github
Open website

270

Github issues

16600

Github stars

0

Days since last commit

3885

Stackoverflow questions

270

Github issues

16600

Github stars

0

Last commit days ago

3885

Stackoverflow questions

Apache Airflow in one line: Apache Airflow manages, schedules, and monitors your data pipelines.

What is Apache Airflow?

Apache Airflow is a workflow orchestration management system. It lets you define pipelines of interdependent tasks using Directed Acyclic Graphs (DAGs). This means you can schedule tasks for execution (think of an advanced version of a crontab). You can use Apache Airflow to monitor your tasks, and it will automatically retry if they fail (while properly managing any dependent tasks upstream or downstream). It also provides a web frontend so you can see the status of your tasks.

Apache Airflow can be tricky to understand because it doesn’t really do much by itself. Instead, it acts as a glue layer for many of your other systems. For example, you can use it to define a data pipeline where several different Python scripts are executed in complex patterns via Celery or Kubernetes .

What problems does Apache Airflow solve?

Many machine learning tasks are set up as data pipelines. Instead of running a single program, many different components run at different stages, and each of these depends on others in complex ways.

Scheduling all the different pieces to run at regular intervals using something like cron can be very hard to maintain, and developers often spend significant time writing integrations between a core program and other tools, such as Celery.

By using Apache Airflow, you can skip Cron altogether and also get some other features out of the box. For example, you can visualize your task dependency graph, monitor task status, and integrate other services via plugins – all with Apache Airflow.

Used by:

Top articles for

Apache Airflow

No items found.