-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG #10341
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
#10328 - should provide you with an endpoint to force refresh all the DAGs |
@kaxil that would refresh DAGs from the DB only for the process that received the POST request, right? Although it still would be a better solution since it would allow one to attempt to refresh the dagbag, in memory, from DB in all the workers. 🤔 |
That will refresh DAGs from DB if DAG Serialization is enabled, if not it will refresh them from DAG files |
I understand that but I'd again like to confirm that it'll only refresh the DAGs for the in-memory instance of the DagBag for that specific gunicorn worker process that received the request.
What I meant above for a catch-all handler (optional and off by default) to get around this randomness of the bug. |
This issue is reported against old version of Airflow (which is end of life). |
Apache Airflow version: 1.10.11
Kubernetes version (if you are using kubernetes) (use
kubectl version
):Environment:
uname -a
):What happened:
I am using an Airflow plugin to generate dynamic DAGs and Airflow is able to successfully load the new ORM DAG in DB. Hence the DAGs listing at home page is also updated. However, trying to refresh the DAG or opening the graph view causes an error:
What you expected to happen:
I expected the refresh/trigger and other functionalities to work fine.
How to reproduce it:
Anything else we need to know:
I understand that waiting for all airflow workers to restart and tweaking
worker_refresh_interval
config would help here. After all, the issue is due to in-memory instances of DagBag not able to collect the new DAG beforehand.While a restart would help, I propose that there could be a configuration bool option like
attempt_refresh_dagbag
(by default it isFalse
for backwards compatibility). If it isTrue
and if DagBag doesn't have the DAG loaded (in this case,DagBag.get_dag()
returnsNone
), it would attempt to load the DAG directly by processing the file stored in theDagModel
.This would be a better option for those who'd like to not wait for the new DAG to sync with all workers. Plus, they can focus on improving performance by increasing the
worker_refresh_interval
value and still work with the new DAGs ASAP.The text was updated successfully, but these errors were encountered: