-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Disable ORM access from Tasks, DAG processing and Triggers #47320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18faa85 to
eb14f67
Compare
eb14f67 to
2a77c31
Compare
ashb
commented
Mar 4, 2025
vincbeck
approved these changes
Mar 4, 2025
jedcunningham
approved these changes
Mar 4, 2025
e55d781 to
1176f81
Compare
7d81e91 to
0f19bf7
Compare
ashb
commented
Mar 5, 2025
1a9f977 to
93aacd2
Compare
93aacd2 to
63cb614
Compare
jscheffl
approved these changes
Mar 5, 2025
Contributor
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
5c5e1ea to
c127964
Compare
All of these use the Workload supervisor from the TaskSDK and the main paths (XCom, Variables and Secrets) have all been ported to use the Execution API, so it's about time we disabled DB access.
c127964 to
34b7152
Compare
Member
Author
|
Oof that was a bit of a mission to land. |
Contributor
Wohooo! |
Member
|
#protm |
Member
Indeed |
ashb
added a commit
that referenced
this pull request
Mar 12, 2025
This change seems innocuous, and possibly even wrong, but it is the correct behaviour since #47320 landed. We _do not_ want to call dispose_orm, as that ends up reconnecting, and sometimes this results in the wrong connection being shared between the parent and the child. I don't love the "sometimes" nature of this bug, but the fix seems sound. Prior to this running one or two runs concurrently would result in the scheduler handing (stuck in SQLA code trying to roll back) or an error from psycopg about "error with status PGRES_TUPLES_OK and no message from the libpq". With this change we were able to repeatedly run 10 runs concurrently. The reason we don't want this is that we registered an at_fork handler already that closes/discards the socket object (without closing the DB level session) so calling dispose can, perversely, resurrect that object and try reusing it! Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
ashb
added a commit
that referenced
this pull request
Mar 12, 2025
This change seems innocuous, and possibly even wrong, but it is the correct behaviour since #47320 landed. We _do not_ want to call dispose_orm, as that ends up reconnecting, and sometimes this results in the wrong connection being shared between the parent and the child. I don't love the "sometimes" nature of this bug, but the fix seems sound. Prior to this running one or two runs concurrently would result in the scheduler handing (stuck in SQLA code trying to roll back) or an error from psycopg about "error with status PGRES_TUPLES_OK and no message from the libpq". With this change we were able to repeatedly run 10 runs concurrently. The reason we don't want this is that we registered an at_fork handler already that closes/discards the socket object (without closing the DB level session) so calling dispose can, perversely, resurrect that object and try reusing it! Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
ashb
added a commit
that referenced
this pull request
Mar 12, 2025
This change seems innocuous, and possibly even wrong, but it is the correct behaviour since #47320 landed. We _do not_ want to call dispose_orm, as that ends up reconnecting, and sometimes this results in the wrong connection being shared between the parent and the child. I don't love the "sometimes" nature of this bug, but the fix seems sound. Prior to this running one or two runs concurrently would result in the scheduler handing (stuck in SQLA code trying to roll back) or an error from psycopg about "error with status PGRES_TUPLES_OK and no message from the libpq". With this change we were able to repeatedly run 10 runs concurrently. The reason we don't want this is that we registered an at_fork handler already that closes/discards the socket object (without closing the DB level session) so calling dispose can, perversely, resurrect that object and try reusing it! Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
jedcunningham
added a commit
that referenced
this pull request
Mar 12, 2025
This change seems innocuous, and possibly even wrong, but it is the correct behaviour since #47320 landed. We _do not_ want to call dispose_orm, as that ends up reconnecting, and sometimes this results in the wrong connection being shared between the parent and the child. I don't love the "sometimes" nature of this bug, but the fix seems sound. Prior to this running one or two runs concurrently would result in the scheduler handing (stuck in SQLA code trying to roll back) or an error from psycopg about "error with status PGRES_TUPLES_OK and no message from the libpq". With this change we were able to repeatedly run 10 runs concurrently. The reason we don't want this is that we registered an at_fork handler already that closes/discards the socket object (without closing the DB level session) so calling dispose can, perversely, resurrect that object and try reusing it! Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
nailo2c
pushed a commit
to nailo2c/airflow
that referenced
this pull request
Apr 4, 2025
This change seems innocuous, and possibly even wrong, but it is the correct behaviour since apache#47320 landed. We _do not_ want to call dispose_orm, as that ends up reconnecting, and sometimes this results in the wrong connection being shared between the parent and the child. I don't love the "sometimes" nature of this bug, but the fix seems sound. Prior to this running one or two runs concurrently would result in the scheduler handing (stuck in SQLA code trying to roll back) or an error from psycopg about "error with status PGRES_TUPLES_OK and no message from the libpq". With this change we were able to repeatedly run 10 runs concurrently. The reason we don't want this is that we registered an at_fork handler already that closes/discards the socket object (without closing the DB level session) so calling dispose can, perversely, resurrect that object and try reusing it! Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
potiuk
added a commit
to jason810496/airflow
that referenced
this pull request
Apr 12, 2025
Some recent changes in main and documentation published in the inventories, made the 2.10 doc building fail as references to non-existing docs in the new inventories were still used in the documentation for 2.10 This PR fixes it by changing the docs to not refer to those changed docs any more. The PRs that removed the links: apache#47320 and apache#47399 Co-authored-by: LIU ZHE YOU <zhu424.dev@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
potiuk
added a commit
to jason810496/airflow
that referenced
this pull request
Apr 12, 2025
Some recent changes in main and documentation published in the inventories, made the 2.10 doc building fail as references to non-existing docs in the new inventories were still used in the documentation for 2.10 This PR fixes it by changing the docs to not refer to those changed docs any more. The PRs that removed the links: apache#47320 and apache#47399 Co-authored-by: LIU ZHE YOU <zhu424.dev@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
potiuk
added a commit
to jason810496/airflow
that referenced
this pull request
Apr 12, 2025
Some recent changes in main and documentation published in the inventories, made the 2.10 doc building fail as references to non-existing docs in the new inventories were still used in the documentation for 2.10 This PR fixes it by changing the docs to not refer to those changed docs any more. The PRs that removed the links: apache#47320 and apache#47399 Co-authored-by: LIU ZHE YOU <zhu424.dev@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
potiuk
added a commit
that referenced
this pull request
Apr 12, 2025
Some recent changes in main and documentation published in the inventories, made the 2.10 doc building fail as references to non-existing docs in the new inventories were still used in the documentation for 2.10 This PR fixes it by changing the docs to not refer to those changed docs any more. The PRs that removed the links: #47320 and #47399 Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:API
Airflow's REST/HTTP API
area:logging
area:providers
area:task-sdk
full tests needed
We need to run full set of tests for this PR to merge
provider:amazon
AWS/Amazon - related issues
provider:edge
Edge Executor / Worker (AIP-69) / edge3
provider:fab
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It's about time we delivered on one of the key points of AIP-72: DB isolation from workers.
(To be honest, it's probably past time, but now is the second best time)
All of these use the Workload supervisor from the TaskSDK and the main paths
XCom, Variables and Secrets) have all been ported to use the Execution API,
so it's about time we disabled DB access.
Note: this will almost certainly break a few things, like Skip mixin based tasks in particular - that is WIP in 46584
Also closes #47232 as that was failing if configure_orm was never called.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.