-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler to handle incrementing of try_number #39336
Merged
dstandish
merged 53 commits into
apache:main
from
astronomer:remove-try-number-shenanigans
May 9, 2024
Merged
Scheduler to handle incrementing of try_number #39336
dstandish
merged 53 commits into
apache:main
from
astronomer:remove-try-number-shenanigans
May 9, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boring-cyborg
bot
added
area:API
Airflow's REST/HTTP API
area:logging
area:plugins
area:providers
area:Scheduler
including HA (high availability) scheduler
area:serialization
area:webserver
Webserver related Issues
provider:amazon-aws
AWS/Amazon - related issues
provider:elasticsearch
labels
Apr 30, 2024
dstandish
force-pushed
the
remove-try-number-shenanigans
branch
2 times, most recently
from
May 1, 2024 20:15
1d54320
to
310f923
Compare
dstandish
commented
May 1, 2024
dstandish
commented
May 1, 2024
dstandish
requested review from
eladkal,
o-nikolas,
ryanahamilton,
ashb,
bbovenzi,
pierrejeambrun,
ephraimbuddy,
kaxil and
XD-DENG
as code owners
May 1, 2024 22:05
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good!
@SamWheating, you might be a good reviewer on this one as well.
dstandish
force-pushed
the
remove-try-number-shenanigans
branch
2 times, most recently
from
May 6, 2024 19:28
36fcb5a
to
ed839d4
Compare
utkarsharma2
added
type:improvement
Changelog: Improvements
type:misc/internal
Changelog: Misc changes that should appear in change log
and removed
type:improvement
Changelog: Improvements
type:misc/internal
Changelog: Misc changes that should appear in change log
labels
Jun 3, 2024
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Jul 26, 2024
Previously, there was a lot of bad stuff happening around try_number. We incremented it when task started running. And because of that, we had this logic to return "_try_number + 1" when task not running. But this gave the "right" try number before it ran, and the wrong number after it ran. And, since it was naively incremented when task starts running -- i.e. without regard to why it is running -- we decremented it when deferring or exiting on a reschedule. What I do here is try to remove all of that stuff: no more private _try_number attr no more getter logic no more decrementing no more incrementing as part of task execution Now what we do is increment only when the task is set to scheduled and only when it's not coming out of deferral or "up_for_reschedule". So the try_number will be more stable. It will not change throughout the course of task execution. The only time it will be incremented is when there's legitimately a new try. One consequence of this is that try number will no longer be incremented if you run either airlfow tasks run or ti.run() in isolation. But because airflow assumes that all tasks runs are scheduled by the scheduler, I do not regard this to be a breaking change. If user code or provider code has implemented hacks to get the "right" try_number when looking at it at the wrong time (because previously it gave the wrong answer), unfortunately that code will just have to be patched. There are only two cases I know of in the providers codebase -- openlineage listener, and dbt openlineage. As a courtesy for backcompat we also add property _try_number which is just a proxy for try_number, so you'll still be able to access this attr. But, it will not behave the same as it did before. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Jul 26, 2024
Previously we had code to compensate for the fact that we were decrementing try_number when deferring or rescheduling. We can remove this code now. Just missed this in apache#39336.
2 tasks
2 tasks
howardyoo
added a commit
to howardyoo/airflow
that referenced
this pull request
Aug 15, 2024
potiuk
pushed a commit
to potiuk/airflow
that referenced
this pull request
Aug 16, 2024
…irflow (issue apache#41501) (apache#41502) * Fix for issue apache#39336 * removed unnecessary import (cherry picked from commit dd3c3a7)
potiuk
added a commit
that referenced
this pull request
Aug 16, 2024
Artuz37
pushed a commit
to Artuz37/airflow
that referenced
this pull request
Aug 19, 2024
…irflow (issue apache#41501) (apache#41502) * Fix for issue apache#39336 * removed unnecessary import
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Aug 20, 2024
…irflow (issue apache#41501) (apache#41502) * Fix for issue apache#39336 * removed unnecessary import
utkarsharma2
added a commit
that referenced
this pull request
Aug 22, 2024
…41610) * Enable pull requests to be run from v*test branches (#41474) (#41476) Since we switch from direct push of cherry-picking to open PRs against v*test branch, we should enable PRs to run for the target branch. (cherry picked from commit a9363e6) * Prevent provider lowest-dependency tests to run in non-main branch (#41478) (#41481) When running tests in v2-10-test branch, lowest depenency tests are run for providers - because when calculating separate tests, the "skip_provider_tests" has not been used to filter them out. This PR fixes it. (cherry picked from commit 75da507) * Make PROD image building works in non-main PRs (#41480) (#41484) The PROD image building fails currently in non-main because it attempts to build source provider packages rather than use them from PyPi when PR is run against "v-test" branch. This PR fixes it: * PROD images in non-main-targetted build will pull providers from PyPI rather than build them * they use PyPI constraints to install the providers * they use UV - which should speed up building of the images (cherry picked from commit 4d5f1c4) * Add WebEncoder for trigger page rendering to avoid render failure (#41350) (#41485) Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> * Incorrect try number subtraction producing invalid span id for OTEL airflow (issue #41501) (#41502) (#41535) * Fix for issue #39336 * removed unnecessary import (cherry picked from commit dd3c3a7) Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> * Fix failing pydantic v1 tests (#41534) (#41541) We need to exclude some versions of Pydantic v1 because it conflicts with aws provider. (cherry picked from commit a033c5f) * Fix Non-DB test calculation for main builds (#41499) (#41543) Pytest has a weird behaviour that it will not collect tests from parent folder when subfolder of it is specified after the parent folder. This caused some non-db tests from providers folder have been skipped during main build. The issue in Pytest 8.2 (used to work before) is tracked at pytest-dev/pytest#12605 (cherry picked from commit d489826) * Add changelog for airflow python client 2.10.0 (#41583) (#41584) * Add changelog for airflow python client 2.10.0 * Update client version (cherry picked from commit 317a28e) * Make all test pass in Database Isolation mode (#41567) This adds dedicated "DatabaseIsolation" test to airflow v2-10-test branch.. The DatabaseIsolation test will run all "db-tests" with enabled DB isolation mode and running `internal-api` component - groups of tests marked with "skip-if-database-isolation" will be skipped. * Upgrade build and chart dependencies (#41570) (#41588) (cherry picked from commit c88192c) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Limit watchtower as depenendcy as 3.3.0 breaks moin. (#41612) (cherry picked from commit 1b602d5) * Enable running Pull Requests against v2-10-stable branch (#41624) (cherry picked from commit e306e7f) * Fix tests/models/test_variable.py for database isolation mode (#41414) * Fix tests/models/test_variable.py for database isolation mode * Review feedback (cherry picked from commit 736ebfe) * Make latest botocore tests green (#41626) The latest botocore tests are conflicting with a few requirements and until apache-beam upcoming version is released we need to do some manual exclusions. Those exclusions should make latest botocore test green again. (cherry picked from commit a13ccbb) * Simpler task retrieval for taskinstance test (#41389) The test has been updated for DB isolation but the retrieval of task was not intuitive and it could lead to flaky tests possibly (cherry picked from commit f25adf1) * Skip database isolation case for task mapping taskinstance tests (#41471) Related: #41067 (cherry picked from commit 7718bd7) * Skipping tests for db isolation because similar tests were skipped (#41450) (cherry picked from commit e94b508) --------- Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Brent Bovenzi <brent@astronomer.io> Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> Co-authored-by: Bugra Ozturk <bugraoz93@users.noreply.github.com>
utkarsharma2
added a commit
that referenced
this pull request
Aug 30, 2024
…41610) * Enable pull requests to be run from v*test branches (#41474) (#41476) Since we switch from direct push of cherry-picking to open PRs against v*test branch, we should enable PRs to run for the target branch. (cherry picked from commit a9363e6) * Prevent provider lowest-dependency tests to run in non-main branch (#41478) (#41481) When running tests in v2-10-test branch, lowest depenency tests are run for providers - because when calculating separate tests, the "skip_provider_tests" has not been used to filter them out. This PR fixes it. (cherry picked from commit 75da507) * Make PROD image building works in non-main PRs (#41480) (#41484) The PROD image building fails currently in non-main because it attempts to build source provider packages rather than use them from PyPi when PR is run against "v-test" branch. This PR fixes it: * PROD images in non-main-targetted build will pull providers from PyPI rather than build them * they use PyPI constraints to install the providers * they use UV - which should speed up building of the images (cherry picked from commit 4d5f1c4) * Add WebEncoder for trigger page rendering to avoid render failure (#41350) (#41485) Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> * Incorrect try number subtraction producing invalid span id for OTEL airflow (issue #41501) (#41502) (#41535) * Fix for issue #39336 * removed unnecessary import (cherry picked from commit dd3c3a7) Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> * Fix failing pydantic v1 tests (#41534) (#41541) We need to exclude some versions of Pydantic v1 because it conflicts with aws provider. (cherry picked from commit a033c5f) * Fix Non-DB test calculation for main builds (#41499) (#41543) Pytest has a weird behaviour that it will not collect tests from parent folder when subfolder of it is specified after the parent folder. This caused some non-db tests from providers folder have been skipped during main build. The issue in Pytest 8.2 (used to work before) is tracked at pytest-dev/pytest#12605 (cherry picked from commit d489826) * Add changelog for airflow python client 2.10.0 (#41583) (#41584) * Add changelog for airflow python client 2.10.0 * Update client version (cherry picked from commit 317a28e) * Make all test pass in Database Isolation mode (#41567) This adds dedicated "DatabaseIsolation" test to airflow v2-10-test branch.. The DatabaseIsolation test will run all "db-tests" with enabled DB isolation mode and running `internal-api` component - groups of tests marked with "skip-if-database-isolation" will be skipped. * Upgrade build and chart dependencies (#41570) (#41588) (cherry picked from commit c88192c) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Limit watchtower as depenendcy as 3.3.0 breaks moin. (#41612) (cherry picked from commit 1b602d5) * Enable running Pull Requests against v2-10-stable branch (#41624) (cherry picked from commit e306e7f) * Fix tests/models/test_variable.py for database isolation mode (#41414) * Fix tests/models/test_variable.py for database isolation mode * Review feedback (cherry picked from commit 736ebfe) * Make latest botocore tests green (#41626) The latest botocore tests are conflicting with a few requirements and until apache-beam upcoming version is released we need to do some manual exclusions. Those exclusions should make latest botocore test green again. (cherry picked from commit a13ccbb) * Simpler task retrieval for taskinstance test (#41389) The test has been updated for DB isolation but the retrieval of task was not intuitive and it could lead to flaky tests possibly (cherry picked from commit f25adf1) * Skip database isolation case for task mapping taskinstance tests (#41471) Related: #41067 (cherry picked from commit 7718bd7) * Skipping tests for db isolation because similar tests were skipped (#41450) (cherry picked from commit e94b508) --------- Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Brent Bovenzi <brent@astronomer.io> Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> Co-authored-by: Bugra Ozturk <bugraoz93@users.noreply.github.com>
utkarsharma2
added a commit
that referenced
this pull request
Aug 30, 2024
…41610) * Enable pull requests to be run from v*test branches (#41474) (#41476) Since we switch from direct push of cherry-picking to open PRs against v*test branch, we should enable PRs to run for the target branch. (cherry picked from commit a9363e6) * Prevent provider lowest-dependency tests to run in non-main branch (#41478) (#41481) When running tests in v2-10-test branch, lowest depenency tests are run for providers - because when calculating separate tests, the "skip_provider_tests" has not been used to filter them out. This PR fixes it. (cherry picked from commit 75da507) * Make PROD image building works in non-main PRs (#41480) (#41484) The PROD image building fails currently in non-main because it attempts to build source provider packages rather than use them from PyPi when PR is run against "v-test" branch. This PR fixes it: * PROD images in non-main-targetted build will pull providers from PyPI rather than build them * they use PyPI constraints to install the providers * they use UV - which should speed up building of the images (cherry picked from commit 4d5f1c4) * Add WebEncoder for trigger page rendering to avoid render failure (#41350) (#41485) Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> * Incorrect try number subtraction producing invalid span id for OTEL airflow (issue #41501) (#41502) (#41535) * Fix for issue #39336 * removed unnecessary import (cherry picked from commit dd3c3a7) Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> * Fix failing pydantic v1 tests (#41534) (#41541) We need to exclude some versions of Pydantic v1 because it conflicts with aws provider. (cherry picked from commit a033c5f) * Fix Non-DB test calculation for main builds (#41499) (#41543) Pytest has a weird behaviour that it will not collect tests from parent folder when subfolder of it is specified after the parent folder. This caused some non-db tests from providers folder have been skipped during main build. The issue in Pytest 8.2 (used to work before) is tracked at pytest-dev/pytest#12605 (cherry picked from commit d489826) * Add changelog for airflow python client 2.10.0 (#41583) (#41584) * Add changelog for airflow python client 2.10.0 * Update client version (cherry picked from commit 317a28e) * Make all test pass in Database Isolation mode (#41567) This adds dedicated "DatabaseIsolation" test to airflow v2-10-test branch.. The DatabaseIsolation test will run all "db-tests" with enabled DB isolation mode and running `internal-api` component - groups of tests marked with "skip-if-database-isolation" will be skipped. * Upgrade build and chart dependencies (#41570) (#41588) (cherry picked from commit c88192c) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Limit watchtower as depenendcy as 3.3.0 breaks moin. (#41612) (cherry picked from commit 1b602d5) * Enable running Pull Requests against v2-10-stable branch (#41624) (cherry picked from commit e306e7f) * Fix tests/models/test_variable.py for database isolation mode (#41414) * Fix tests/models/test_variable.py for database isolation mode * Review feedback (cherry picked from commit 736ebfe) * Make latest botocore tests green (#41626) The latest botocore tests are conflicting with a few requirements and until apache-beam upcoming version is released we need to do some manual exclusions. Those exclusions should make latest botocore test green again. (cherry picked from commit a13ccbb) * Simpler task retrieval for taskinstance test (#41389) The test has been updated for DB isolation but the retrieval of task was not intuitive and it could lead to flaky tests possibly (cherry picked from commit f25adf1) * Skip database isolation case for task mapping taskinstance tests (#41471) Related: #41067 (cherry picked from commit 7718bd7) * Skipping tests for db isolation because similar tests were skipped (#41450) (cherry picked from commit e94b508) --------- Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Brent Bovenzi <brent@astronomer.io> Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> Co-authored-by: Bugra Ozturk <bugraoz93@users.noreply.github.com>
2 tasks
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Nov 9, 2024
Changes: - provide custom GCS task handler - write task logs to stdout for fluentd to expose in Cloud Logging - write DAG processor manager logs to stdout for fluentd to expose in Cloud Logging - write custom Composer metrics - implement custom Composer log filter - use same log format for Celery logs as for all other logs - set sqlfluff logging level to WARNING to avoid polluting parsing logs - modify write_metrics() to not decrement `try_number` value after changes from apache/airflow#39336 Change-Id: Ie6ca8e8a544dbd661bc74db38b2cc419144bb9a2 GitOrigin-RevId: 54ef1a1bcb67b4f4855ccef4ced98e0a4ad280bc
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:API
Airflow's REST/HTTP API
area:logging
area:plugins
area:providers
area:Scheduler
including HA (high availability) scheduler
area:serialization
area:webserver
Webserver related Issues
full tests needed
We need to run full set of tests for this PR to merge
provider:amazon-aws
AWS/Amazon - related issues
provider:elasticsearch
type:improvement
Changelog: Improvements
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, there was a lot of bad stuff happening around try_number.
We incremented it when task started running. And because of that, we had this logic to return "_try_number + 1" when task not running. But this gave the "right" try number before it ran, and the wrong number after it ran. And, since it was naively incremented when task starts running -- i.e. without regard to why it is running -- we decremented it when deferring or exiting on a reschedule.
What I do here is try to remove all of that stuff:
Now what we do is increment only when the task is set to scheduled and only when it's not coming out of deferral or "up_for_reschedule". So the try_number will be more stable. It will not change throughout the course of task execution. The only time it will be incremented is when there's legitimately a new try.
One consequence of this is that try number will no longer be incremented if you run either
airlfow tasks run
orti.run()
in isolation. But because airflow assumes that all tasks runs are scheduled by the scheduler, I do not regard this to be a breaking change.If user code or provider code has implemented hacks to get the "right" try_number when looking at it at the wrong time (because previously it gave the wrong answer), unfortunately that code will just have to be patched. There are only two cases I know of in the providers codebase -- openlineage listener, and dbt openlineage.
As a courtesy for backcompat we also add property
_try_number
which is just a proxy for try_number, so you'll still be able to access this attr. But, it will not behave the same as it did before.