Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-40: Deferrable Tasks #15389

Merged
merged 1 commit into from
Aug 11, 2021
Merged

AIP-40: Deferrable Tasks #15389

merged 1 commit into from
Aug 11, 2021

Conversation

andrewgodwin
Copy link
Contributor

@andrewgodwin andrewgodwin commented Apr 15, 2021

This is the implementation of AIP-40, Deferrable "Async" Tasks (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177050929).

The main changes are:

  • A new concept of a Trigger is introduced, as a small piece of asyncio code that can fire off events
    • There is a BaseTrigger and some time-related triggers under a new airflow.triggers package
    • There is a new Trigger database model and associated trigger table
    • Async versions of the various date/time sensors have been added which defer rather than poke.
  • There is a new persistent process (Job) called triggerer
    • It only runs on Python 3.7+
    • It handles polling the database for which triggers need running, running them, and re-scheduling task instances whose triggers have fired events
    • If a trigger throws an exception or exits without firing an event, it logs why and marks dependent task instances as failed
    • It monitors the asyncio event loop with a watchdog task and alerts the user if anything is overrunning (i.e. not using await) and blocking the loop.
    • It is designed to run in parallel with itself in a highly-available manner, and also has built-in consistent-hash based partitioning (sharding) support
  • Task Instances have a new deferred state which indicates they are waiting on a trigger to run
    • The trigger they are waiting for is stored in a new trigger_id column, and a failure timeout is in a trigger_timeout column
    • The scheduler takes care of timing out task instances into the failed state
    • Deferral is triggered by raising the TaskDeferred exception, or calling self.defer on the TaskInstance which does the same thing.
    • A next_method and next_kwargs column are added to specify what a task instance/operator's execution entry point should be if it's not the default of execute(). They are currently only used by deferral, but have been written to be independent in case they are useful elsewhere.
  • Two new dependencies are added
    • jump-consistent-hash is a small MIT licensed library that implements a fast, consistent hash algorithm
    • pytest-asyncio is an Apache 2 licensed library that enables async tests to be written easily

Changes that are deliberately not in here and will be in a future PR for them specifically:

  • UI warning when the triggerer is not running and you have deferred task instances
  • Updating Breeze to include triggerer in what it runs
  • Updating the Docker Compose files to include triggerer
  • Updating the Helm Chart to include triggerer
  • Some way of detecting/preventing DB access within triggers

Remaining fixes:

  • SQLite does not seem to like deferral
  • Deferred tasks should fail if their trigger fails to load

@boring-cyborg boring-cyborg bot added area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:Scheduler including HA (high availability) scheduler labels Apr 15, 2021
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@andrewgodwin andrewgodwin force-pushed the deferrable branch 2 times, most recently from 1551b7c to a16c727 Compare April 20, 2021 21:17
@andrewgodwin andrewgodwin force-pushed the deferrable branch 2 times, most recently from 18c8b99 to fbef4b4 Compare May 4, 2021 18:13
@github-actions
Copy link

github-actions bot commented May 4, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 4, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 4, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@andrewgodwin andrewgodwin changed the title AIP-40 prototype: Deferrable Tasks AIP-40: Deferrable Tasks May 5, 2021
@github-actions
Copy link

github-actions bot commented May 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 5, 2021

The Workflow run is cancelling this PR. Building image for the PR has been cancelled

@github-actions
Copy link

github-actions bot commented May 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented May 7, 2021

The Workflow run is cancelling this PR. Building image for the PR has been cancelled

@andrewgodwin andrewgodwin marked this pull request as ready for review May 7, 2021 19:42
@andrewgodwin andrewgodwin force-pushed the deferrable branch 3 times, most recently from c980c2b to 8e05ca7 Compare August 6, 2021 16:38
@andrewgodwin andrewgodwin force-pushed the deferrable branch 4 times, most recently from 0fdc216 to 86e04a6 Compare August 9, 2021 17:15
@andrewgodwin andrewgodwin force-pushed the deferrable branch 2 times, most recently from 291c370 to 345ab34 Compare August 10, 2021 23:02
This adds two concepts - being able to defer operators, in which they
enter a state where they are not running but waiting for an event to
resume, and Triggers, which are asynchronous bits of code that run in
massive parallel and fire events to un-defer operators.

See AIP-40 for more details.
super().__init__(*args, **kwargs)

if capacity is None:
self.capacity = conf.getint('triggerer', 'default_capacity', fallback=1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.capacity = conf.getint('triggerer', 'default_capacity', fallback=1000)
self.capacity = conf.getint('triggerer', 'default_capacity')

(Eventually I'll start my crusade against fallback, but this will mean 1 less when that time comes)

@kaxil kaxil requested a review from ashb August 11, 2021 19:32
@kaxil kaxil dismissed ashb’s stale review August 11, 2021 19:32

Stale review - we can followup with more changes

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 11, 2021
@kaxil kaxil merged commit ec5b4fe into apache:main Aug 11, 2021
@kaxil kaxil deleted the deferrable branch August 11, 2021 19:33
@kaxil
Copy link
Member

kaxil commented Aug 11, 2021

Looks good to me -- I think we can follow up if there are more changes

@kaxil kaxil mentioned this pull request Aug 19, 2021
2 tasks
kaxil added a commit to astronomer/airflow that referenced this pull request Aug 20, 2021
Adds triggerer component added in apache#15389 (AIP-40) to the docker-compose.yaml file for quick start
kaxil added a commit that referenced this pull request Aug 20, 2021
Adds triggerer component added in #15389 (AIP-40) to the docker-compose.yaml file for quick start
kaxil added a commit that referenced this pull request Aug 23, 2021
Adds triggerer component added in #15389 (AIP-40) to the Helm Chart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants