-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AirflowClusterPolicyViolation Overhaul - Pause DAG, Make it so that DAG cannot be unpaused, still shows in DAG list view but has tooltip saying why can't be unpaused #18410
Comments
CC: @jaketf |
@alex-astronomer this sounds like a great improvement to me, and the use case you describe is the intention of the original |
I agree but it seems a bit odd to me to mix paused/active to this. Say I add a new policy that violated 50 DAGs. According to this all of them will be paused. |
@eladkal Great point. I still do believe that the DAGs should be paused and unable to run in this situation. Here's my reason for that. If suddenly there is an AirflowClusterPolicy that is implemented that would disable 50 DAGs shouldn't that be done, just because if the policy is violated then we don't want to be able to run the DAGs anyway? I understand what you're saying is that it would be hard to know what DAGs were paused and it would be hard to revert. I think that a way that we could get around this shortcoming is audit logging for the DAGs that were paused and putting some logs saying what happened in the Logs section of Airlfow. What do you think about that? |
@alex-astronomer I can argue that the DAGs should not be paused. If you prevent DAGs from running you have 2 edge cases:
I think the easiest way to overcome this is just having a configuration in airflow.cfg that lets users decide if they want to enable this feature or not. |
@alex-astronomer @eladkal I agree with |
Description
Overview
AirflowClusterPolicyViolation behaves in very strange ways right now. I think that this would be an important feature to overhaul for users that have complex DAG definition requirements for tagging, owners, naming conventions, etc. Very useful for Airflow administrators that have many application teams using the same deployment of Airflow.
Current behavior
The AirflowClusterPolicyViolation exception when called from
airflow_local_settings.py
is that the DAG shows up still in the DAG list view (different from other import errors, where sometimes they will not show in this list any more). The DAG remains paused or unpaused depending on what state it was in before the DAG cluster policy was deployed. The DAG can still be scheduled and run, but all of the tasks within that DAG will fail with no errors and no logs. This silent failure is very confusing for developers that don't see the import error on their DAG.New Expected Behavior and Overhaul
The behavior that I expect to see when the AirflowClusterPolicyViolation is thrown is that the DAG will become paused, and cannot be unpaused until the DAG adheres to the cluster policy. There will be a tooltip on the pause button that explains why the DAG cannot be unpaused. The import error will show in the DAG view as well as the DAG list view, solved by #17818. No tasks will be scheduled or run, and no DAGRuns will be scheduled or run, until the cluster policy is adhered to.
Use case/motivation
I want to offer more support to users that have many application teams working on the same deployment of Airflow. Part of data pipeline quality is making policies that teams are unable to violate. If the teams adhere to the policy, they can run their DAG.
Right now the behavior is very confusing and it's challenging to pause a DAG that violates a cluster policy. It is possible to pause a DAG within the cluster policy function before the exception is raised, but there exists a race condition if the DAG is unpaused again after that. The DAG will try to run its tasks between the time that the DAG is unpaused and the DAG policy function has time to pause it again. This creates even more confusing behavior.
I believe that a DAG should not be able to be run unless it adheres to the cluster policy.
Very open to discussion about the best way to solve this issue.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: