Skip to content

Commit

Permalink
Capitalize dag to DAG (apache#29064)
Browse files Browse the repository at this point in the history
  • Loading branch information
BasPH authored and maggesssss committed Jan 21, 2023
1 parent e2a4727 commit 3c2dba5
Show file tree
Hide file tree
Showing 26 changed files with 48 additions and 48 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ There are three types of cluster policy:

* ``dag_policy``: Takes a :class:`~airflow.models.dag.DAG` parameter called ``dag``. Runs at load time of the DAG from DagBag :class:`~airflow.models.dagbag.DagBag`.
* ``task_policy``: Takes a parameter called ``task`` that is of type either :class:`~airflow.models.baseoperator.BaseOperator` or :class:`~airflow.models.mappedoperator.MappedOperator` (for `dynamically expanded tasks <dynamic-task-mapping>`_). The policy gets executed when the task is created during parsing of the task from DagBag at load time. This means that the whole task definition can be altered in the task policy. It does not relate to a specific task running in a DagRun. The ``task_policy`` defined is applied to all the task instances that will be executed in the future.
* ``task_instance_mutation_hook``: Takes a :class:`~airflow.models.taskinstance.TaskInstance` parameter called ``task_instance``. The ``task_instance_mutation`` applies not to a task but to the instance of a task that relates to a particular DagRun. It is executed in a "worker", not in the dag file processor, just before the task instance is executed. The policy is only applied to the currently executed run (i.e. instance) of that task.
* ``task_instance_mutation_hook``: Takes a :class:`~airflow.models.taskinstance.TaskInstance` parameter called ``task_instance``. The ``task_instance_mutation`` applies not to a task but to the instance of a task that relates to a particular DagRun. It is executed in a "worker", not in the DAG file processor, just before the task instance is executed. The policy is only applied to the currently executed run (i.e. instance) of that task.

The DAG and Task cluster policies can raise the :class:`~airflow.exceptions.AirflowClusterPolicyViolation` exception to indicate that the dag/task they were passed is not compliant and should not be loaded.
The DAG and Task cluster policies can raise the :class:`~airflow.exceptions.AirflowClusterPolicyViolation` exception to indicate that the DAG/task they were passed is not compliant and should not be loaded.

Any extra attributes set by a cluster policy take priority over those defined in your DAG file; for example, if you set an ``sla`` on your Task in the DAG file, and then your cluster policy also sets an ``sla``, the cluster policy's value will take precedence.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ in favor of ``data_interval_start``.
Breadcrumbs
------------

When a task fails with an error `breadcrumbs <https://docs.sentry.io/platforms/python/enriching-events/breadcrumbs/>`__ will be added for the other tasks in the current dag run.
When a task fails with an error `breadcrumbs <https://docs.sentry.io/platforms/python/enriching-events/breadcrumbs/>`__ will be added for the other tasks in the current DAG run.

======================================= ==============================================================
Name Description
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ Name Description
``scheduler.tasks.running`` Number of tasks running in executor
``scheduler.tasks.starving`` Number of tasks that cannot be scheduled because of no open slot in pool
``scheduler.tasks.executable`` Number of tasks that are ready for execution (set to queued)
with respect to pool limits, dag concurrency, executor state,
with respect to pool limits, DAG concurrency, executor state,
and priority.
``executor.open_slots`` Number of open slots on executor
``executor.queued_tasks`` Number of queued tasks on executor
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ scheduler looks for DAGs. It should contain either regular expressions (the defa
for the paths that should be ignored. You do not need to have that file in any other folder in
``PYTHONPATH`` (and also you can only keep shared code in the other folders, not the actual DAGs).

In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
In the example above the DAGs are only in ``my_custom_dags`` folder, the ``common_package`` should not be
scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
want to ignore the ``base_dag.py`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag2.py`` derives
from. Your ``.airflowignore`` should look then like this:
Expand Down Expand Up @@ -167,7 +167,7 @@ There are a few gotchas you should be careful about when you import your code.
Use unique top package name
...........................

It is recommended that you always put your dags/common files in a subpackage which is unique to your
It is recommended that you always put your DAGs/common files in a subpackage which is unique to your
deployment (``my_company`` in the example below). It is far too easy to use generic names for the
folders that will clash with other packages already present in the system. For example if you
create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
Expand All @@ -184,7 +184,7 @@ This is tempting to do something like that it in ``my_dag1.py``:
from .base_dag import BaseDag # NEVER DO THAT!!!!
You should import such shared dag using full path (starting from the directory which is added to
You should import such shared DAG using full path (starting from the directory which is added to
``PYTHONPATH``):

.. code-block:: python
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ the :doc:`Celery executor <../core-concepts/executor/celery>`.


Once you have configured the executor, it is necessary to make sure that every node in the cluster contains
the same configuration and dags. Airflow sends simple instructions such as "execute task X of dag Y", but
does not send any dag files or configuration. You can use a simple cronjob or any other mechanism to sync
the same configuration and DAGs. Airflow sends simple instructions such as "execute task X of DAG Y", but
does not send any DAG files or configuration. You can use a simple cronjob or any other mechanism to sync
DAGs and configs across your nodes, e.g., checkout DAGs from git repo every 5 minutes on all nodes.


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Your DAGs will start executing once the scheduler is running successfully.
Subsequent DAG Runs are created according to your DAG's :doc:`timetable <../authoring-and-scheduling/timetable>`.


For dags with a cron or timedelta schedule, scheduler won't trigger your tasks until the period it covers has ended e.g., A job with ``schedule`` set as ``@daily`` runs after the day
For DAGs with a cron or timedelta schedule, scheduler won't trigger your tasks until the period it covers has ended e.g., A job with ``schedule`` set as ``@daily`` runs after the day
has ended. This technique makes sure that whatever data is required for that period is fully available before the DAG is executed.
In the UI, it appears as if Airflow is running your tasks a day **late**

Expand Down Expand Up @@ -180,7 +180,7 @@ different processes. In order to fine-tune your scheduler, you need to include a
* The logic and definition of your DAG structure:
* how many DAG files you have
* how many DAGs you have in your files
* how large the DAG files are (remember dag parser needs to read and parse the file every n seconds)
* how large the DAG files are (remember DAG parser needs to read and parse the file every n seconds)
* how complex they are (i.e. how fast they can be parsed, how many tasks and dependencies they have)
* whether parsing your DAG file involves importing a lot of libraries or heavy processing at the top level
(Hint! It should not. See :ref:`best_practices/top_level_code`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,10 @@ Custom Roles

DAG Level Role
^^^^^^^^^^^^^^
``Admin`` can create a set of roles which are only allowed to view a certain set of dags. This is called DAG level access. Each dag defined in the dag model table
``Admin`` can create a set of roles which are only allowed to view a certain set of DAGs. This is called DAG level access. Each DAG defined in the DAG model table
is treated as a ``View`` which has two permissions associated with it (``can_read`` and ``can_edit``. ``can_dag_read`` and ``can_dag_edit`` are deprecated since 2.0.0).
There is a special view called ``DAGs`` (it was called ``all_dags`` in versions 1.10.*) which
allows the role to access all the dags. The default ``Admin``, ``Viewer``, ``User``, ``Op`` roles can all access ``DAGs`` view.
allows the role to access all the DAGs. The default ``Admin``, ``Viewer``, ``User``, ``Op`` roles can all access ``DAGs`` view.

.. image:: /img/add-role.png
.. image:: /img/new-role.png
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Kerberos
--------

Airflow has initial support for Kerberos. This means that Airflow can renew Kerberos
tickets for itself and store it in the ticket cache. The hooks and dags can make use of ticket
tickets for itself and store it in the ticket cache. The hooks and DAGs can make use of ticket
to authenticate against kerberized services.

Limitations
Expand Down Expand Up @@ -137,7 +137,7 @@ use it, simply update the connection details with, for example:
Adjust the principal to your settings. The ``_HOST`` part will be replaced by the fully qualified domain name of
the server.

You can specify if you would like to use the dag owner as the user for the connection or the user specified in the login
You can specify if you would like to use the DAG owner as the user for the connection or the user specified in the login
section of the connection. For the login user, specify the following as extra:

.. code-block:: json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Serialization

To support data exchange, like arguments, between tasks, Airflow needs to serialize the data to be exchanged and
deserialize it again when required in a downstream task. Serialization also happens so that the webserver and
the scheduler (as opposed to the dag processor) do no need to read the DAG file. This is done for security purposes
the scheduler (as opposed to the DAG processor) do no need to read the DAG file. This is done for security purposes
and efficiency.

Serialization is a surprisingly hard job. Python out of the box only has support for serialization of primitives,
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ Good example:
@task
def my_task():
var = Variable.get("foo") # this is fine, because func my_task called only run task, not scan dags.
var = Variable.get("foo") # this is fine, because func my_task called only run task, not scan DAGs.
print(var)
For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/core-concepts/dag-run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ In addition, you can also manually trigger a DAG Run using the web UI (tab **DAG

.. _dagrun:parameters:

Passing Parameters when triggering dags
Passing Parameters when triggering DAGs
------------------------------------------

When triggering a DAG from the CLI, the REST API or the UI, it is possible to pass configuration for a DAG Run as
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/core-concepts/dags.rst
Original file line number Diff line number Diff line change
Expand Up @@ -748,7 +748,7 @@ Packaging DAGs

While simpler DAGs are usually only in a single Python file, it is not uncommon that more complex DAGs might be spread across multiple files and have dependencies that should be shipped with them ("vendored").

You can either do this all inside of the ``DAG_FOLDER``, with a standard filesystem layout, or you can package the DAG and all of its Python files up as a single zip file. For instance, you could ship two dags along with a dependency they need as a zip file with the following contents::
You can either do this all inside of the ``DAG_FOLDER``, with a standard filesystem layout, or you can package the DAG and all of its Python files up as a single zip file. For instance, you could ship two DAGs along with a dependency they need as a zip file with the following contents::

my_dag1.py
my_dag2.py
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/core-concepts/executor/debug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ To set up ``dag.test``, add these two lines to the bottom of your dag file:
if __name__ == "__main__":
dag.test()
and that's it! You can add argument such as ``execution_date`` if you want to test argument-specific dagruns, but otherwise
and that's it! You can add argument such as ``execution_date`` if you want to test argument-specific DAG runs, but otherwise
you can run or debug DAGs as needed.

Comparison with DebugExecutor
Expand Down
10 changes: 5 additions & 5 deletions docs/apache-airflow/core-concepts/executor/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ With these requirements in mind, here are some examples of basic ``pod_template_

The examples below should work when using default Airflow configuration values. However, many custom
configuration values need to be explicitly passed to the pod via this template too. This includes,
but is not limited to, sql configuration, required Airflow connections, dag folder path and
but is not limited to, sql configuration, required Airflow connections, DAGs folder path and
logging settings. See :doc:`../../configurations-ref` for details.

Storing DAGs in the image:
Expand Down Expand Up @@ -181,7 +181,7 @@ Here is an example of a task with both features:
print_stuff()
Managing dags and logs
Managing DAGs and logs
~~~~~~~~~~~~~~~~~~~~~~

Use of persistent volumes is optional and depends on your configuration.
Expand All @@ -190,9 +190,9 @@ Use of persistent volumes is optional and depends on your configuration.

To get the DAGs into the workers, you can:

- Include dags in the image.
- Use ``git-sync`` which, before starting the worker container, will run a ``git pull`` of the dags repository.
- Storing dags on a persistent volume, which can be mounted on all workers.
- Include DAGs in the image.
- Use ``git-sync`` which, before starting the worker container, will run a ``git pull`` of the DAGs repository.
- Storing DAGs on a persistent volume, which can be mounted on all workers.

- **Logs**:

Expand Down
6 changes: 3 additions & 3 deletions docs/apache-airflow/core-concepts/params.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Params
======

Params are how Airflow provides runtime configuration to tasks.
When you trigger a DAG manually, you can modify its Params before the dagrun starts.
If the user-supplied values don't pass validation, Airflow shows a warning instead of creating the dagrun.
When you trigger a DAG manually, you can modify its Params before the DAG run starts.
If the user-supplied values don't pass validation, Airflow shows a warning instead of creating the DAG run.
(For scheduled runs, the default values are used.)

Adding Params to a DAG
Expand Down Expand Up @@ -115,7 +115,7 @@ You can also add Params to individual tasks.
python_callable=print_it,
)
If there's already a dag param with that name, the task-level default will take precedence over the dag-level default.
If there's already a DAG param with that name, the task-level default will take precedence over the DAG-level default.
If a user supplies their own value when the DAG was triggered, Airflow ignores all defaults and uses the user's value.

JSON Schema Validation
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/core-concepts/xcoms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ If you want to implement your own backend, you should subclass :class:`~airflow.

There is also an ``orm_deserialize_value`` method that is called whenever the XCom objects are rendered for UI or reporting purposes; if you have large or expensive-to-retrieve values in your XComs, you should override this method to avoid calling that code (and instead return a lighter, incomplete representation) so the UI remains responsive.

You can also override the ``clear`` method and use it when clearing results for given dags and tasks. This allows the custom XCom backend to process the data lifecycle easier.
You can also override the ``clear`` method and use it when clearing results for given DAGs and tasks. This allows the custom XCom backend to process the data lifecycle easier.

Working with Custom XCom Backends in Containers
-----------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/apache-airflow/deprecated-rest-api-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Endpoints

.. http:post:: /api/experimental/dags/<DAG_ID>/dag_runs
Creates a dag_run for a given dag id.
Creates a dag_run for a given DAG id.
Note: If execution_date is not specified in the body, Airflow by default creates only one DAG per second for a given DAG_ID.
In order to create multiple DagRun within one second, you should set parameter ``"replace_microseconds"`` to ``"false"`` (boolean as string).

Expand Down Expand Up @@ -124,4 +124,4 @@ Endpoints

.. http:get:: /api/experimental/lineage/<DAG_ID>/<string:execution_date>/
Returns the lineage information for the dag.
Returns the lineage information for the DAG.
14 changes: 7 additions & 7 deletions docs/apache-airflow/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ When the return value is less than or equal to 0, it means no timeout during the
return conf.getfloat("core", "DAGBAG_IMPORT_TIMEOUT")
When there are a lot (>1000) of dags files, how to speed up parsing of new files?
When there are a lot (>1000) of DAG files, how to speed up parsing of new files?
---------------------------------------------------------------------------------

(only valid for Airflow >= 2.1.1)
Expand All @@ -157,7 +157,7 @@ Change the :ref:`config:scheduler__file_parsing_sort_mode` to ``modified_time``,
the :ref:`config:scheduler__min_file_process_interval` to ``600`` (10 minutes), ``6000`` (100 minutes)
or a higher value.

The dag parser will skip the ``min_file_process_interval`` check if a file is recently modified.
The DAG parser will skip the ``min_file_process_interval`` check if a file is recently modified.

This might not work for case where the DAG is imported/created from a separate file. Example:
``dag_file.py`` that imports ``dag_loader.py`` where the actual logic of the DAG file is as shown below.
Expand Down Expand Up @@ -450,17 +450,17 @@ Set the value of ``update_fab_perms`` configuration in ``airflow.cfg`` to ``Fals
How to reduce the airflow UI page load time?
------------------------------------------------

If your dag takes long time to load, you could reduce the value of ``default_dag_run_display_number`` configuration
in ``airflow.cfg`` to a smaller value. This configurable controls the number of dag run to show in UI with default
If your DAG takes long time to load, you could reduce the value of ``default_dag_run_display_number`` configuration
in ``airflow.cfg`` to a smaller value. This configurable controls the number of DAG runs to show in UI with default
value ``25``.


Why did the pause dag toggle turn red?
Why did the pause DAG toggle turn red?
--------------------------------------

If pausing or unpausing a dag fails for any reason, the dag toggle will
If pausing or unpausing a DAG fails for any reason, the DAG toggle will
revert to its previous state and turn red. If you observe this behavior,
try pausing the dag again, or check the console or server logs if the
try pausing the DAG again, or check the console or server logs if the
issue recurs.


Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/howto/add-dag-tags.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Add tags to DAGs and use it for filtering in the UI

.. versionadded:: 1.10.8

In order to filter DAGs (e.g by team), you can add tags in each dag.
In order to filter DAGs (e.g by team), you can add tags in each DAG.
The filter is saved in a cookie and can be reset by the reset button.

For example:
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/howto/custom-operator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ You can create any operator you want by extending the :class:`airflow.models.bas
There are two methods that you need to override in a derived class:

* Constructor - Define the parameters required for the operator. You only need to specify the arguments specific to your operator.
You can specify the ``default_args`` in the dag file. See :ref:`Default args <concepts-default-arguments>` for more details.
You can specify the ``default_args`` in the DAG file. See :ref:`Default args <concepts-default-arguments>` for more details.

* Execute - The code to execute when the runner calls the operator. The method contains the
Airflow context as a parameter that can be used to read config values.
Expand Down
Loading

0 comments on commit 3c2dba5

Please sign in to comment.