Skip to content

Commit 6f8480c

Browse files
authored
Docs: update screenshots, improve clarity and flow (home page) (#48899)
Rewrote the “Why Airflow” section to improve narrative flow, tighten language, and reduce redundancy. Grouped related ideas (e.g. scheduling/backfilling and UI interactions), clarified phrasing around extensibility, and smoothed the conclusion with community references. Other than adding new screenshots for Airflow 3, no content was removed—just restructured and refined for readability.
1 parent 3039600 commit 6f8480c

File tree

6 files changed

+53
-65
lines changed

6 files changed

+53
-65
lines changed
286 KB
Loading
168 KB
Loading
-373 KB
Binary file not shown.
-64 KB
Binary file not shown.
190 KB
Loading

airflow-core/docs/index.rst

Lines changed: 53 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,17 @@ What is Airflow®?
2020

2121
`Apache Airflow® <https://github.com/apache/airflow>`_ is an open-source platform for developing, scheduling,
2222
and monitoring batch-oriented workflows. Airflow's extensible Python framework enables you to build workflows
23-
connecting with virtually any technology. A web interface helps manage the state of your workflows. Airflow is
24-
deployable in many ways, varying from a single process on your laptop to a distributed setup to support even
25-
the biggest workflows.
23+
connecting with virtually any technology. A web-based UI helps you visualize, manage, and debug your workflows.
24+
You can run Airflow in a variety of configurations — from a single process on your laptop to a distributed system
25+
capable of handling massive workloads.
2626

2727
Workflows as code
2828
=========================================
29-
The main characteristic of Airflow workflows is that all workflows are defined in Python code. "Workflows as
30-
code" serves several purposes:
29+
Airflow workflows are defined entirely in Python. This "workflows as code" approach brings several advantages:
3130

32-
- **Dynamic**: Airflow pipelines are configured as Python code, allowing for dynamic pipeline generation.
33-
- **Extensible**: The Airflow® framework contains operators to connect with numerous technologies. All Airflow components are extensible to easily adjust to your environment.
34-
- **Flexible**: Workflow parameterization is built-in leveraging the `Jinja <https://jinja.palletsprojects.com>`_ templating engine.
31+
- **Dynamic**: Pipelines are defined in code, enabling dynamic dag generation and parameterization.
32+
- **Extensible**: The Airflow framework includes a wide range of built-in operators and can be extended to fit your needs.
33+
- **Flexible**: Airflow leverages the `Jinja <https://jinja.palletsprojects.com>`_ templating engine, allowing rich customizations.
3534

3635
Dags
3736
-----------------------------------------
@@ -40,14 +39,13 @@ Dags
4039
:start-after: .. dag-definition-start
4140
:end-before: .. dag-definition-end
4241

43-
Take a look at the following snippet of code:
42+
Let's look at a code snippet that defines a simple dag:
4443

4544
.. code-block:: python
4645
4746
from datetime import datetime
4847
49-
from airflow.sdk import DAG
50-
from airflow.decorators import task
48+
from airflow.sdk import DAG, task
5149
from airflow.providers.standard.operators.bash import BashOperator
5250
5351
# A DAG represents a workflow, a collection of tasks
@@ -65,83 +63,73 @@ Take a look at the following snippet of code:
6563
6664
Here you see:
6765

68-
- A DAG named "demo", starting on Jan 1st 2022 and running once a day. A DAG is Airflow's representation of a workflow.
69-
- Two tasks, a BashOperator running a Bash script and a Python function defined using the ``@task`` decorator
70-
- ``>>`` between the tasks defines a dependency and controls in which order the tasks will be executed
66+
- A dag named ``"demo"``, scheduled to run daily starting on January 1st, 2022. A dag is how Airflow represents a workflow.
67+
- Two tasks: One using a ``BashOperator`` to run a shell script, and another using the ``@task`` decorator to define a Python function.
68+
- The ``>>`` operator defines a dependency between the two tasks and controls execution order.
7169

72-
Airflow evaluates this script and executes the tasks at the set interval and in the defined order. The status
73-
of the "demo" DAG is visible in the web interface:
70+
Airflow parses the script, schedules the tasks, and executes them in the defined order. The status of the ``"demo"`` dag
71+
is displayed in the web interface:
7472

75-
.. image:: /img/demo_graph_view.png
76-
:alt: Demo DAG in the Graph View, showing the status of one DAG run
73+
.. image:: /img/demo_graph_and_code_view.png
74+
:alt: Demo DAG in the Graph View, showing the status of one DAG run along with DAG code.
7775

78-
This example demonstrates a simple Bash and Python script, but these tasks can run any arbitrary code. Think
79-
of running a Spark job, moving data between two buckets, or sending an email. The same structure can also be
80-
seen running over time:
76+
|
8177
82-
.. image:: /img/demo_grid_view.png
83-
:alt: Demo DAG in the Grid View, showing the status of all DAG runs
78+
This examples uses a simple Bash command and Python function, but Airflow tasks can run virtually any code. You might use
79+
tasks to run a Spark job, move files between storage buckets, or send a notification email. Here's what that same dag looks
80+
like over time, with multiple runs:
8481

85-
Each column represents one DAG run. These are two of the most used views in Airflow, but there are several
86-
other views which allow you to deep dive into the state of your workflows.
82+
.. image:: /img/demo_grid_view_with_task_logs.png
83+
:alt: Demo DAG in the Grid View, showing the status of all DAG runs, as well as logs for a task instance
84+
85+
|
86+
87+
Each column in the grid represents a single dag run. While the graph and grid views are most commonly used, Airflow provides
88+
several other views to help you monitor and troubleshoot workflows — such as the ``DAG Overview`` view:
89+
90+
.. image:: /img/demo_dag_overview_with_failed_tasks.png
91+
:alt: Overview of a complex DAG in the Grid View, showing the status of all DAG runs, as well as quick links to recently failed task logs
92+
93+
|
8794
8895
.. include:: /../../devel-common/src/sphinx_exts/includes/dag-definition.rst
8996
:start-after: .. dag-etymology-start
9097
:end-before: .. dag-etymology-end
9198

92-
9399
Why Airflow®?
94100
=========================================
95-
Airflow® is a batch workflow orchestration platform. The Airflow framework contains operators to connect with
96-
many technologies and is easily extensible to connect with a new technology. If your workflows have a clear
97-
start and end, and run at regular intervals, they can be programmed as an Airflow DAG.
98-
99-
If you prefer coding over clicking, Airflow is the tool for you. Workflows are defined as Python code which
100-
means:
101-
102-
- Workflows can be stored in version control so that you can roll back to previous versions
103-
- Workflows can be developed by multiple people simultaneously
104-
- Tests can be written to validate functionality
105-
- Components are extensible and you can build on a wide collection of existing components
106-
107-
Rich scheduling and execution semantics enable you to easily define complex pipelines, running at regular
108-
intervals. Backfilling allows you to (re-)run pipelines on historical data after making changes to your logic.
109-
And the ability to rerun partial pipelines after resolving an error helps maximize efficiency.
110-
111-
Airflow's user interface provides:
101+
Airflow is a platform for orchestrating batch workflows. It offers a flexible framework with a wide range of built-in operators
102+
and makes it easy to integrate with new technologies.
112103

113-
1. In-depth views of two things:
104+
If your workflows have a clear start and end and run on a schedule, they're a great fit for Airflow DAGs.
114105

115-
i. Pipelines
116-
ii. Tasks
106+
If you prefer coding over clicking, Airflow is built for you. Defining workflows as Python code provides several key benefits:
117107

118-
2. Overview of your pipelines over time
108+
- **Version control**: Track changes, roll back to previous versions, and collaborate with your team.
109+
- **Team collaboration**: Multiple developers can work on the same workflow codebase.
110+
- **Testing**: Validate pipeline logic through unit and integration tests.
111+
- **Extensibility**: Customize workflows using a large ecosystem of existing components — or build your own.
119112

120-
From the interface, you can inspect logs and manage tasks, for example retrying a task in
121-
case of failure.
113+
Airflow's rich scheduling and execution semantics make it easy to define complex, recurring pipelines. From the web interface,
114+
you can manually trigger DAGs, inspect logs, and monitor task status. You can also backfill DAG runs to process historical
115+
data, or rerun only failed tasks to minimize cost and time.
122116

123-
The open-source nature of Airflow ensures you work on components developed, tested, and used by many other
124-
`companies <https://github.com/apache/airflow/blob/main/INTHEWILD.md>`_ around the world. In the active
125-
`community <https://airflow.apache.org/community>`_ you can find plenty of helpful resources in the form of
126-
blog posts, articles, conferences, books, and more. You can connect with other peers via several channels
127-
such as `Slack <https://s.apache.org/airflow-slack>`_ and mailing lists.
117+
The Airflow platform is highly customizable. With the :doc:`public-airflow-interface` you can extend and adapt nearly
118+
every part of the system — from operators to UI plugins to execution logic.
128119

129-
Airflow as a Platform is highly customizable. By utilizing :doc:`public-airflow-interface` you can extend
130-
and customize almost every aspect of Airflow.
120+
Because Airflow is open source, you're building on components developed, tested, and maintained by a global community.
121+
You'll find a wealth of learning resources, including blog posts, books, and conference talks — and you can connect with
122+
others via the `community <https://airflow.apache.org/community>`_, `Slack <https://s.apache.org/airflow-slack>`_, and mailing lists.
131123

132124
Why not Airflow®?
133125
=================
134126

135-
Airflow® was built for finite batch workflows. While the CLI and REST API do allow triggering workflows,
136-
Airflow was not built for infinitely running event-based workflows. Airflow is not a streaming solution.
137-
However, a streaming system such as Apache Kafka is often seen working together with Apache Airflow. Kafka can
138-
be used for ingestion and processing in real-time, event data is written to a storage location, and Airflow
139-
periodically starts a workflow processing a batch of data.
127+
Airflow® is designed for finite, batch-oriented workflows. While you can trigger DAGs using the CLI or REST API, Airflow is not
128+
intended for continuously running, event-driven, or streaming workloads. That said, Airflow often complements streaming systems like Apache Kafka.
129+
Kafka handles real-time ingestion, writing data to storage. Airflow can then periodically pick up that data and process it in batch.
140130

141-
If you prefer clicking over coding, Airflow is probably not the right solution. The web interface aims to make
142-
managing workflows as easy as possible and the Airflow framework is continuously improved to make the
143-
developer experience as smooth as possible. However, the philosophy of Airflow is to define workflows as code
144-
so coding will always be required.
131+
If you prefer clicking over coding, Airflow might not be the best fit. The web UI simplifies workflow management, and the developer
132+
experience is continuously improving, but defining workflows as code is central to how Airflow works — so some coding is always required.
145133

146134

147135
.. toctree::

0 commit comments

Comments
 (0)