-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger die with DB deadlock between scheduler #27000
Comments
What is your Airflow version? |
2.3.2 |
I'll close the issue cuz i didn't find an easy way to reproduce. The lock contention between two queries are quite hard to construct. The workaround about this is to reduce how often scheduler checks for timeout triggers to avoid potential lock contention. airflow/airflow/jobs/scheduler_job.py Lines 1461 to 1484 in 0d78ba5
There is configuration trigger_timeout_check_interval default to 15. I raise it to a reasonable higher value and the deadlock issue is greatly reduced.
|
I will re-open this one. It has enough information to try to avoid the deadlock in the first place - the problem is that Triggerer acquires the same locks as scheduler but in a different sequence, the right solution should be to change either Triggerer (most likely) or scheduler (rather unlikely) to apply the same sequence for locks. Most likely Triggered shoudl attempt to loclk DagRun first and only then update task instance or even avoid locking DagRun in the first place. I believe we fixed a very similar deadlock situation recently. |
I will take a look at this shortly (or maybe @ashb or @andrewgodwin might take a look at it before). |
This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
Apache Airflow version
Other Airflow 2 version (2.3.2)
What happened
There is discussion #22553 about this but without detailed trace. There is also a similar issue #23639. Trigger will occasionly die due to DB transaction deadlock. In my case the trigger dies 5-6 times per day.
Mysql engine status
Trigger exit log
This query holds row lock in primary index (
dag_id
,task_id
,run_id
,map_index
), waiting for secondary index lock.airflow/airflow/models/trigger.py
Lines 118 to 135 in 0d78ba5
This query holds row lock in secondary index as engine status telled (
state
), waiting for primary index lock, causing the deadlock.airflow/airflow/jobs/scheduler_job.py
Lines 1461 to 1484 in 0d78ba5
#22553 and #23639 offer different solutions towards this.
with_row_lock
to queries so selected rows will be pre-locked, without lock contention.As for retry, there is already retry in previous methods.
airflow/airflow/models/trigger.py
Lines 94 to 116 in 0d78ba5
What you think should happen instead
No response
How to reproduce
No response
Operating System
ubuntu
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: