apache · ashb · Oct 28, 2020 · Oct 12, 2020 · Oct 28, 2020
diff --git a/README.md b/README.md
@@ -69,7 +69,11 @@ Apache Airflow is tested with:
 | SQLite       | latest stable             | latest stable            |
 | Kubernetes   | 1.16.2, 1.17.0            | 1.16.2, 1.17.0           |
 
-> Note: SQLite is used primarily for development purpose.
+**Note:** MariaDB and MySQL 5.x will work fine for a single scheduler, but don't work or have limitations
+running than a single scheduler -- please see the "Scheduler" docs.
+
+**Note:** SQLite is used primarily for development purpose.
+
 
 ### Additional notes on Python version requirements
 

diff --git a/airflow/config_templates/config.yml b/airflow/config_templates/config.yml
@@ -1601,30 +1601,25 @@
       default: "True"
     - name: max_dagruns_to_create_per_loop
       description: |
-        This changes the number of dags that are locked by each scheduler when
-        creating dag runs. One possible reason for setting this lower is if you
-        have huge dags and are running multiple schedules, you won't want one
-        scheduler to do all the work.
+        Max number of DAGs to create DagRuns for per scheduler loop
 
         Default: 10
       example: ~
       version_added: 2.0.0
       type: string
       default: ~
+      see_also: ":ref:`scheduler:ha:tunables`"
     - name: max_dagruns_per_loop_to_schedule
       description: |
         How many DagRuns should a scheduler examine (and lock) when scheduling
-        and queuing tasks. Increasing this limit will allow more throughput for
-        smaller DAGs but will likely slow down throughput for larger (>500
-        tasks for example) DAGs. Setting this too high when using multiple
-        schedulers could also lead to one scheduler taking all the dag runs
-        leaving no work for the others.
+        and queuing tasks.
 
         Default: 20
       example: ~
       version_added: 2.0.0
       type: string
       default: ~
+      see_also: ":ref:`scheduler:ha:tunables`"
     - name: statsd_on
       description: |
         Statsd (https://github.com/etsy/statsd) integration settings

diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
@@ -804,20 +804,13 @@ max_tis_per_query = 512
 # scheduler at once
 use_row_level_locking = True
 
-# This changes the number of dags that are locked by each scheduler when
-# creating dag runs. One possible reason for setting this lower is if you
-# have huge dags and are running multiple schedules, you won't want one
-# scheduler to do all the work.
+# Max number of DAGs to create DagRuns for per scheduler loop
 #
 # Default: 10
 # max_dagruns_to_create_per_loop =
 
 # How many DagRuns should a scheduler examine (and lock) when scheduling
-# and queuing tasks. Increasing this limit will allow more throughput for
-# smaller DAGs but will likely slow down throughput for larger (>500
-# tasks for example) DAGs. Setting this too high when using multiple
-# schedulers could also lead to one scheduler taking all the dag runs
-# leaving no work for the others.
+# and queuing tasks.
 #
 # Default: 20
 # max_dagruns_per_loop_to_schedule =

diff --git a/docs/configurations-ref.rst b/docs/configurations-ref.rst
@@ -42,6 +42,8 @@ can set in ``airflow.cfg`` file or using environment variables.
 
     {% for option in section["options"] %}
 
+    .. _config:{{ section["name"] }}__{{ option["name"] }}:
+
     {{ option["name"] }}
     {{ "-" * option["name"]|length }}
 

diff --git a/docs/howto/initialize-database.rst b/docs/howto/initialize-database.rst
@@ -26,6 +26,9 @@ setting up a real database backend and switching to the LocalExecutor.
 Airflow was built to interact with its metadata using SqlAlchemy
 with **MySQL**,  **Postgres** and **SQLite** as supported backends (SQLite is used primarily for development purpose).
 
+.. seealso:: :ref:`Scheduler HA Database Requirements <scheduler:ha:db_requirements>` if you plan on running
+   more than one scheduler
+
 .. note:: We rely on more strict ANSI SQL settings for MySQL in order to have
    sane defaults. Make sure to have specified ``explicit_defaults_for_timestamp=1``
    in your my.cnf under ``[mysqld]``

diff --git a/docs/scheduler.rst b/docs/scheduler.rst
@@ -65,3 +65,99 @@ If you want to use 'external trigger' to run future-dated execution dates, set `
 This only has effect if your DAG has no ``schedule_interval``.
 If you keep default ``allow_trigger_in_future = False`` and try 'external trigger' to run future-dated execution dates,
 the scheduler won't execute it now but the scheduler will execute it in the future once the current date rolls over to the execution date.
+
+Running More Than One Scheduler
+-------------------------------
+
+.. versionadded: 2.0.0
+
+Airflow supports running more than one scheduler concurrently -- both for performance reasons and for
+resiliency.
+
+Overview
+""""""""
+
+The :abbr:`HA (highly available)` scheduler is designed to take advantage of the existing metadata database.
+This was primarily done for operational simplicity: every component already has to speak to this DB, and by
+not using direct communication or consensus algorithm between schedulers (Raft, Paxos, etc.) nor another
+consensus tool (Apache Zookeeper, or Consul for instance) we have kept the "operational surface area" to a
+minimum.
+
+The scheduler now uses the serialized DAG representation to make its scheduling decisions and the rough
+outline of the scheduling loop is:
+
+- Check for any DAGs needing a new DagRun, and create them
+- Examine a batch of DagRuns for schedulable TaskInstances or complete DagRuns
+- Select schedulable TaskInstances, and whilst respecting Pool limits and other concurrency limits, enqueue
+  them for execution
+
+This does however place some requirements on the Database.
+
+.. _scheduler:ha:db_requirements:
+
+Database Requirements
+"""""""""""""""""""""
+
+The short version is that users of PostgreSQL 9.6+ or MySQL 8+ are all ready to go -- you can start running as
+many copies of the scheduler as you like -- there is no further set up or config options needed. If you are
+using a different database please read on.
+
+To maintain performance and throughput there is one part of the scheduling loop that does a number of
+calculations in memory (because having to round-trip to the DB for each TaskInstance would be too slow) so we
+need to ensure that only a single scheduler is in this critical section at once - otherwise limits would not
+be correctly respected. To achieve this we use database row-level locks (using ``SELECT ... FOR UPDATE``).
+
+This critical section is where TaskInstances go from scheduled state and are enqueued to the executor, whilst
+ensuring the various concurrency and pool limits are respected. The critical section is obtained by asking for
+a row-level write lock on every row of the Pool table (roughly equivalent to ``SELECT * FROM slot_pool FOR
+UPDATE NOWAIT`` but the exact query is slightly different).
+
+The following databases are fully supported and provide an "optimal" experience:
+
+- PostgreSQL 9.6+
+- MySQL 8+
+
+.. warning::
+
+  MariaDB does not implement the ``SKIP LOCKED`` or ``NOWAIT`` SQL clauses (see `MDEV-13115
+  <https://jira.mariadb.org/browse/MDEV-13115>`_). Without these features running multiple schedulers is not
+  supported and deadlock errors have been reported.
+
+.. warning::
+
+  MySQL 5.x also does not support ``SKIP LOCKED`` or ``NOWAIT``, and additionally is more prone to deciding
+  queries are deadlocked, so running with more than a single scheduler on MySQL 5.x is not supported or
+  recommended.
+
+.. note::
+
+  Microsoft SQLServer has not been tested with HA.
+
+.. _scheduler:ha:tunables:
+
+Scheduler Tuneables
+"""""""""""""""""""
+
+The following config settings can be used to control aspects of the Scheduler HA loop.
+
+- :ref:`config:scheduler__max_dagruns_to_create_per_loop`
+
+  This changes the number of dags that are locked by each scheduler when
+  creating dag runs. One possible reason for setting this lower is if you
+  have huge dags and are running multiple schedules, you won't want one
+  scheduler to do all the work.
+
+- :ref:`config:scheduler__max_dagruns_per_loop_to_schedule`
+
+  How many DagRuns should a scheduler examine (and lock) when scheduling
+  and queuing tasks. Increasing this limit will allow more throughput for
+  smaller DAGs but will likely slow down throughput for larger (>500
+  tasks for example) DAGs. Setting this too high when using multiple
+  schedulers could also lead to one scheduler taking all the dag runs
+  leaving no work for the others.
+
+- :ref:`config:scheduler__use_row_level_locking`
+
+  Should the scheduler issue ``SELECT ... FOR UPDATE`` in relevant queries.
+  If this is set to False then you should not run more than a single
+  scheduler at once
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
@@ -265,6 +265,7 @@ Parallelize
 Parameterizing
 Paramiko
 Params
+Paxos
 Pem
 Pinot
 Popen
@@ -356,6 +357,7 @@ ToC
 Tomasz
 Tooltip
 Tsai
+Tuneables
 UA
 Uellendall
 Umask
@@ -677,6 +679,7 @@ emr
 enableAutoScale
 encryptor
 enqueue
+enqueued
 entrypoint
 enum
 env
@@ -1133,6 +1136,7 @@ savedModel
 scalability
 scalable
 sched
+schedulable
 schedulername
 schemas
 sdk