Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG #10341

Closed
shivanshs9 opened this issue Aug 15, 2020 · 6 comments
Labels
kind:bug This is a clearly a bug

Comments

@shivanshs9
Copy link
Contributor

Apache Airflow version: 1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

I am using an Airflow plugin to generate dynamic DAGs and Airflow is able to successfully load the new ORM DAG in DB. Hence the DAGs listing at home page is also updated. However, trying to refresh the DAG or opening the graph view causes an error:

[2020-08-15 13:12:01,862] {{app.py:1891}} ERROR - Exception on /refresh [POST]
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www_rbac/decorators.py", line 121, in wrapper
    return f(self, *args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/security/decorators.py", line 109, in wraps
    return f(self, *args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www_rbac/decorators.py", line 56, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www_rbac/views.py", line 1941, in refresh
    appbuilder.sm.sync_perm_for_dag(dag_id, dag.access_control)
AttributeError: 'NoneType' object has no attribute 'access_control'

What you expected to happen:

I expected the refresh/trigger and other functionalities to work fine.

How to reproduce it:

  • Launch the airflow webserver and scheduler as usual.
  • Create a new DAG in runtime by using dagen-airflow plugin.
  • Use the Dagen UI to create a new DAG and approve it.
  • Go to Airflow homepage and you'll find the newly-created DAG listed there.
  • Click on refresh link and the error pops up.

Anything else we need to know:

I understand that waiting for all airflow workers to restart and tweaking worker_refresh_interval config would help here. After all, the issue is due to in-memory instances of DagBag not able to collect the new DAG beforehand.
While a restart would help, I propose that there could be a configuration bool option like attempt_refresh_dagbag (by default it is False for backwards compatibility). If it is True and if DagBag doesn't have the DAG loaded (in this case, DagBag.get_dag() returns None), it would attempt to load the DAG directly by processing the file stored in the DagModel.
This would be a better option for those who'd like to not wait for the new DAG to sync with all workers. Plus, they can focus on improving performance by increasing the worker_refresh_interval value and still work with the new DAGs ASAP.

@shivanshs9 shivanshs9 added the kind:bug This is a clearly a bug label Aug 15, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 15, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@kaxil
Copy link
Member

kaxil commented Aug 17, 2020

#10328 - should provide you with an endpoint to force refresh all the DAGs

@shivanshs9
Copy link
Contributor Author

@kaxil that would refresh DAGs from the DB only for the process that received the POST request, right?
I think it would still randomly fail when opening the DAG even after "refresh all" button is clicked.

Although it still would be a better solution since it would allow one to attempt to refresh the dagbag, in memory, from DB in all the workers. 🤔

@kaxil
Copy link
Member

kaxil commented Aug 17, 2020

@kaxil that would refresh DAGs from the DB only for the process that received the POST request, right?
I think it would still randomly fail when opening the DAG even after "refresh all" button is clicked.

Although it still would be a better solution since it would allow one to attempt to refresh the dagbag, in memory, from DB in all the workers. 🤔

That will refresh DAGs from DB if DAG Serialization is enabled, if not it will refresh them from DAG files

@shivanshs9
Copy link
Contributor Author

That will refresh DAGs from DB if DAG Serialization is enabled, if not it will refresh them from DAG files

I understand that but I'd again like to confirm that it'll only refresh the DAGs for the in-memory instance of the DagBag for that specific gunicorn worker process that received the request.
With more than 1 web worker, trying to open DAG details or trigger will still randomly fail since the "refresh all" request may be POSTed to some other worker.

I understand that waiting for all airflow workers to restart and tweaking worker_refresh_interval config would help here. After all, the issue is due to in-memory instances of DagBag not able to collect the new DAG beforehand.
While a restart would help, I propose that there could be a configuration bool option like attempt_refresh_dagbag (by default it is False for backwards compatibility). If it is True and if DagBag doesn't have the DAG loaded (in this case, DagBag.get_dag() returns None), it would attempt to load the DAG directly by processing the file stored in the DagModel.
This would be a better option for those who'd like to not wait for the new DAG to sync with all workers. Plus, they can focus on improving performance by increasing the worker_refresh_interval value and still work with the new DAGs ASAP.

What I meant above for a catch-all handler (optional and off by default) to get around this randomness of the bug.

@eladkal
Copy link
Contributor

eladkal commented Oct 10, 2021

This issue is reported against old version of Airflow (which is end of life).
If the issue is still present in latest airflow version please let us know.

@eladkal eladkal closed this as completed Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

3 participants