-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.6.1 Queued DagRun for some DAGs, and for some not #31687
Comments
Could you please provide an example of your DAG files? (contains the dag config, like the start date, schedule, catchup...) |
Here it is
|
There might be a bug in the Also you can try |
I will enable today the catchup=True to see how it will go, and if it doesn't help, on tomorrow will run with catchup=True and the old syntax for cron. |
The
|
Possibly related to #27399? |
In this issue, airflow doesn't create runs for the whole days/runs - it acts like the dag is disabled and skips days/runs. |
In this issue, airflow doesn't create runs for the whole days/runs - it acts like the dag is disabled and skips days/runs. I think that might still be related. Simply some subtle bug (like running the schedule precisely at the very moment it should be scheduled) might trigger it. For some reason you seem to have an installation where this behaviour seems to be easily reproducible, so maybe we can use it to narrow down the issue. I think @hussein-awala was right it would be great if you could try to reproduce it with old expression and catchup = False. From what I understand above, Also cc: @uranusjr -> It really looks like some edge-case i CronTriggerTimetable from the description and helpful experiments done by @ibardarov-fms . The 14 seconds delay in queue time shows that likely there might be a race condition that gets triggered somewhere by the timetable. |
cc: @uranusjr @hussein-awala . I do not have yet the exact scenario in mind but looking at the "catchup" fixing the problem and the code of the triggerer, I have a possible candidate. One of the best candidates I have this use of the
|
Will run it with catchup=False and the old cron. I am using those environment variables/settings:
|
With catchup=False and the old cron way the problem is visible. From 30 dags, there were scheduled only 5. |
I think the two are indeed related. The alignment implementation guess seems plausible, but I’m not able to come up with an example when it actually triggers a problem now (i.e. a failing test case). |
I believe #32921 might fix that one. @ibardarov-fms - is it possible that you apply the fix in your installation of Airflow and check it ? |
I have run it with the fix and it looks good. |
The fix is working. I run it two times and all dags were scheduled correctly and run. I have to create new dags and need to make sure that with the new dags the problem will be visible. |
Apache Airflow version
2.6.1
What happened
We are running 40 dags of the same kind. All are started at 05:00.
After upgrade to 2.6.1 sometimes randomly dags are not scheduled and there are no created dag-runs.
What you think should happen instead
I would like to see a green column for the next period.
If there is something failing I would expect to see an error or at least warning message somewhere.
How to reproduce
It happens after the upgrade from 2.3.2.
Operating System
Ubuntu
Versions of Apache Airflow Providers
Deployment
Docker-Compose
Deployment details
I ran airflow from docker-compose.
Anything else
When I manually pause and unpause the dag nothing happens.
In the audit log there is no information of anyone trying to run the dag.
In all the postgres tables there are no created entries/rows for the failing dag for the missing dates.
There are no logs created for the missing days.
There are no errors in the other log files.
I tried to allocate a lot of memory in a container and it works.
I added swap file but it looks it has been never used.
The tasks are running dbt
For dag processor I see from time to time some PID
in the scheduler log i see nothing is schedulled at the expected time
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: