Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that makes AirflowSecurityManagerV2 leave transactions in the idle in transaction state #39935

Conversation

alejandro-rivera
Copy link
Contributor

@alejandro-rivera alejandro-rivera commented May 30, 2024

Description

This PR fixes a bug that makes AirflowSecurityManagerV2 leave some transactions in the idle in transaction state.

The problem is due to the combined usage of the @cached_property and @provide_session decorators on top of the _auth_manager_is_authorized_map method: the @cached_property decorator makes the create_session context manager get consumed and not be available to wrap the nested functions.

The fix presented here just makes sure each of the resource-retrieving nested functions in _auth_manager_is_authorized_map can use a session object with a clean state.

Impact

This problem affects views that perform fine-grained access checks using resource ID (for example, editing variables, editing connections, etc).

Apart from the negative impact of having orphaned transactions in the idle in transaction state, this problem also may lead to sqlalchemy.exc.PendingRollbackError exceptions whilst trying to edit a resource.

The sqlalchemy pool (using the default settings) seems to just chug along reusing the connections that have orphaned transactions, until they get invalidated. Then, if the Session object that is automatically passed to the security manager's nested functions gets to use one of those invalidated connections, an exception like the following is thrown:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/flask/app.py", line 2529, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/www/decorators.py", line 159, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/www/auth.py", line 110, in wraps
    if permission_str in self.base_permissions and self.appbuilder.sm.has_access(
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/www/security_manager.py", line 142, in has_access
    return is_authorized_method(action_name, resource_pk, user)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/www/security_manager.py", line 318, in <lambda>
    details=VariableDetails(key=get_variable_key(resource_pk)),
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/www/security_manager.py", line 227, in get_variable_key
    variable = session.scalar(select(Variable).where(Variable.id == resource_pk).limit(1))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1747, in scalar
    return self.execute(
           ^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1808, in _execute_context
    conn = self._revalidate_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 650, in _revalidate_connection
    self._invalid_transaction()
  File "/home/airflow/.local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 622, in _invalid_transaction
    raise exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)

Tweaking the sqlalchemy pool settings and using pgbouncer also seem to play a role in mitigating this issue. In any case, there are already some users reporting errors editing resources. See #39581.

I'm not familiar with the connection invalidation mechanisms within sqlalchemy pools, but that's definitely a component in generating errors like the above.

I stumbled upon this problem in a Postgres-based environment after upgrading to Airflow 2.9.1 (Python 3.11). I don't know if other DB backends are equally affected.

Reproducing

  1. Spin up an Airflow environment using the official docker-compose file
  2. Make the adjustments to gain access to the postgres container from your local machine
  3. Monitor the idle in transaction transactions:
watch -n 1 'psql postgresql://airflow:airflow@postgres-container/airflow -P pager=off -c "select * from pg_stat_activity where state = '"'idle in transaction'"'"'
  1. Add various resources (variables, connections, etc) through the web UI
  2. Edit the above resources in the web UI
  3. Observe the idle in transaction monitor command show orphaned transactions that won't go away...until the webserver is restarted

The problem goes away by applying the code in this PR.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…he `idle in transaction` state.

The problem is due to the combined usage of the `@cached_property` and `@provide_session` decorators on top of the `_auth_manager_is_authorized_map` method.

`@cached_property` makes the `create_session` context manager get consumed and not be available to the nested functions.
Copy link

boring-cyborg bot commented May 30, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing so much details in the description! That helps a lot to understand the issue. (And thanks for fixing a bug I created :D)

@vincbeck
Copy link
Contributor

Some tests are failing

@eladkal eladkal linked an issue May 30, 2024 that may be closed by this pull request
2 tasks
@alejandro-rivera
Copy link
Contributor Author

Some tests are failing

@vincbeck thanks for the heads up on the test failures. I'll take a look at those later today.

Cheers.

… just use the session object that we already have available through the Security Manager's `appbuilder` attribute.
@raphaelauv
Copy link
Contributor

let's add this fix to 2.9.2

@potiuk potiuk added this to the Airflow 2.9.2 milestone May 31, 2024
@potiuk
Copy link
Member

potiuk commented May 31, 2024

let's add this fix to 2.9.2

added

@eladkal
Copy link
Contributor

eladkal commented May 31, 2024

Is it possiblw to add test to avoid regression?

…ssions and instead uses the session already available through the `appbuilder` object that is attached to it.
@alejandro-rivera
Copy link
Contributor Author

Is it possiblw to add test to avoid regression?

@eladkal great idea.

In c2f6a15 I added a test that makes sure the Security Manager doesn't create extra DB sessions and instead uses the session already available through the appbuilder object that is attached to it.

The test fails for the previous version of the Security Manager.

Cheers.

@vincbeck vincbeck merged commit b5bb039 into apache:main Jun 3, 2024
47 checks passed
Copy link

boring-cyborg bot commented Jun 3, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Jun 4, 2024
utkarsharma2 pushed a commit that referenced this pull request Jun 4, 2024
…he `idle in transaction` state (#39935)

(cherry picked from commit b5bb039)
ephraimbuddy pushed a commit that referenced this pull request Jun 5, 2024
…he `idle in transaction` state (#39935)

(cherry picked from commit b5bb039)
fdemiane pushed a commit to fdemiane/airflow that referenced this pull request Jun 6, 2024
syedahsn pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 7, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure on editing a Variable
6 participants