Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom Trigger Rule logic #17010

Closed
SangwanP opened this issue Jul 14, 2021 · 26 comments
Closed

Add support for custom Trigger Rule logic #17010

SangwanP opened this issue Jul 14, 2021 · 26 comments

Comments

@SangwanP
Copy link
Contributor

Description
I would like to have a trigger rule that triggers a task if all the upstream tasks have executed (all_done) and there is atleast one successful upstream task. This would be like the trigger rule: one_success, except it'll wait for all upstream tasks to finish.

Use case / motivation
I have downstream tasks that should run after all the upstream tasks have finished execution. I can't trigger the downstream task if none of the upstream tasks pass, but I need to trigger it if even one of them passes.

Are you willing to submit a PR?
I can work on a PR, if the feature is approved.

@SangwanP SangwanP added the kind:feature Feature Requests label Jul 14, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 14, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@raphaelauv
Copy link
Contributor

raphaelauv commented Jul 14, 2021

Problem what to do about the person that need

all_done_and_X_succeed where X could be 2 and not one

Look like airflow is missing an expressive pattern to let the user implement any custom TriggerRule logic

Maybe there could be an interface

class TriggerRule(ABC):

    @abstractmethod
    def check_condition(self)->bool:
         pass

that the users could extends and the current https://github.com/apache/airflow/blob/main/airflow/utils/trigger_rule.py would also implement that interface

@SangwanP
Copy link
Contributor Author

Having the ability to define a custom trigger rule would be nice.
The difference between triggering with 2+ successes compared to 1+ success(es) is that the latter is more generalized (run downstream, if any of the upstream tasks succeed) compared to the the former which is a bit more specific. That being said, as a user, I would be happy to have functionality that lets me customize my trigger rule.

@raphaelauv
Copy link
Contributor

raphaelauv commented Aug 7, 2021

the trigger rule none_failed_or_skipped is a solution to your use case -> https://airflow.apache.org/docs/apache-airflow/2.1.2/concepts/dags.html?highlight=none_failed_or_skipped#trigger-rules

task_at_least_one_previous_succeed = SomeOperator(
    trigger_rule = "none_failed_or_skipped",
    ...
)

[task_a,task_b,task_c] >> task_at_least_one_previous_succeed

@eladkal
Copy link
Contributor

eladkal commented Aug 26, 2021

The specific request of @SangwanP can be handled by an existed trigger rule
however the concept of user defined/custom trigger rules is interesting (though I'm not really sure how much users actually look for this functionality). I'm editing the issue to reflect this request.

@eladkal eladkal reopened this Aug 26, 2021
@eladkal eladkal changed the title Add a trigger rule to support all_done_and_one_success Add support for custom Trigger Rule logic Aug 26, 2021
@AlexBogs
Copy link

AlexBogs commented Sep 30, 2021

It'd be nice to be able to have custom trigger rule, but it seem to be already investigated and declined AIRFLOW-389

Me too need 'all done and none upstream_failed' rule after an 'optional' task. If 'optional' task fails by itself then downstream tasks should proceed, but if it has upstream_failed then downstream tasks should have upstream_failed status too.

It's possible to make desired behavior by adding 2 dummy tasks: 1 downstream from the 'optional' task with the 'all_done' rule, and another one afterwards with 'all_success' rule and the 'optional' task parents included too, but it will make already bloated DAG even bigger.

@potiuk
Copy link
Member

potiuk commented Sep 30, 2021

It'd be nice to be able to have custom trigger rule, but it seem to be already investigated and declined AIRFLOW-389

Me too need 'all done and none upstream_failed' rule after an 'optional' task. If 'optional' task fails by itself then downstream tasks should proceed, but if it has upstream_failed then downstream tasks should have upstream_failed status too.

I think custom trigger rules are possible but we need to make sure they are well implemented from the isolation/security point of view (especially if you want to define them in DAGs which should not be allowed).

Airflow scheduler makes scheduling decisions based on those, and the rule is that only Workers and File Processors (which run in isolated, Sandboxed Python Processes ) can execute user-DAG-provided code.

If we would like to implement custom trigger rule logic, it would have to be installable via providers/plugins only similar as new scheduling Timetables in Airflow 2.2. Which means that they have to be "installed" when Airflow is installed rather than added with DAGs.

@eladkal
Copy link
Contributor

eladkal commented Sep 30, 2021

agree with @potiuk there is also the question of - does it worth the effort?
I haven't seen so many requests for unique and "odd" trigger rules. For the moment it feels that maybe we are missing 2-3 trigger rules and everyone will be happy. I guess I wanted to see here how many people are in favor of this feature and what trigger rules are missing.

@AlexBogs
Copy link

Agree that given the security constraints, it's not worth it and a few more triggers would cover most needs.
If there would be still custom requests, maybe we can consider adding ability to specify 'and'/'or' conditions to the existing triggers, probably only 'or' to avoid impossible conditions.

Are you willing to submit a PR?
I can work on a PR for 'all_done_none_upstream_failed' trigger, if it's approved.
It was requested before in AIRFLOW-389 and currently there's no way to differentiate between upstream failed and failed tasks.

@malthe
Copy link
Contributor

malthe commented Nov 12, 2021

I think something like TriggerRule.WAIT_ALL & TriggerRule.ONE_SUCCESS is easier to understand than the existing TriggerRule.NONE_FAILED_OR_SKIPPED rule.

Following the ideas in #19361 – it seems that these ideas can combine, e.g.

   ...
   trigger_rule=TriggerRule.WAIT_ALL & TriggerRule.ONE_SUCCESS & TriggerRule.MONDAY

Those are all static rules that can be combined with boolean logic such as &, | and ~ (not).

(This assumes that TriggerRule.<name> are no longer just simple strings, but support these logical operators. For backwards compatibility, a simple string would still work until the next major version.)

In #19361 there is also CronTimeFilter which could be turned into a trigger rule as well – in the example above, TriggerRule.MONDAY is really just a predefined cron-based time filter.

Like timetables (see AIP-39), there could be support for custom implementations, but it might be reasonable to start with a set of builtin classes which can then be combined as proposed above.

In some scheduling systems, there is support for custom calendars such as "bank holidays". This seems like a good case for a custom implementation since there is no clear single solution. There are Python libraries such as holidays that address official holidays, but an organization might have a need for much more specific calendars.

@uranusjr
Copy link
Member

I like the boolean combination idea. This will likely be a big undertaking though with a lot of tests needed since there’s much existing code that uses == and in that’d silently break against this change. An AIP will likely be needed to discuss and resolve potential problems.

@potiuk
Copy link
Member

potiuk commented Nov 14, 2021

To be honest it does not seem as bad when it comes to a "size" of change.

There are not that many places where trigger_rule is used (< 80 cases from a quick look) - so changing the rules to those rules would be possible.

However I am not sure if that is needed. Maybe just the name of the rule should be changed (NONE_FAILED_OR_SKIPPED -> WAIT_FOR_ALL_AND_ONE_SUCCESS - it could could even be an alias.

I think not everything that can be done, should be done.

If we allow for combine several triggers, there will be nonsense combinations (ALL_SUCCESS & ONE_FAILED for example) and we would have to detect all such cases when such combinations and mark them as invaild). That sounds really dangerous IMHO and opens a new class of potential erros when defining DAGS. I really like that people have predefined set of triggers that they can use rather than freely combine them. I think this is a bit too much freedom - with very little benefit, and a lot of potential problems.

Very similarly - we could add (I actually had a working POC) MultiExecutor - to be able to combine any of the executors we have but eventually we only have CeleryKubernetesExecutor as "another executor" as this is the combination of executors that gives 90% of cases and saves us solving a lot of problems for people who would like to combine say - sequential executor. local and K8S one.

Additionally it opens up new cases which can add even more confusion and problems - like I think throwing calendar in the mix as a trigger is a really, really bad idea. Previously you only looked at the state of upstream tasks to determine whether the trigger fired. If we add logically connected calendar this is really changing the whole "logic" of triggers - and it is far too much IMHO. The "day" based triggerer is not "structural" decision, it is "time" decision and it should not be mixed in here..

@guim4dev
Copy link

@potiuk i really like the alias idea. Currently, the doc says, in regards to none_failed_or_skipped:

"All upstream tasks have not failed or upstream_failed, and at least one upstream task has succeeded."

After this discussion, this doesn't seem to be the truth for me. "WAIT_FOR_ALL_AND_ONE_SUCCESS" describes the real behavior of the TriggerRule.

@raphaelauv
Copy link
Contributor

@guim4dev

I changed the name of the trigger_rule check the latest version of the doc ( not the 2.1.2)

none_failed_min_one_success

@eladkal
Copy link
Contributor

eladkal commented Jan 4, 2022

Let me summarize this issue:
This thread started with a request to renamed a specific trigger rule (which was done) but then we changed the issue to be about custom Trigger rule logic.

Quoting my reply some months ago:

agree with @potiuk there is also the question of - does it worth the effort?
I haven't seen so many requests for unique and "odd" trigger rules. For the moment it feels that maybe we are missing 2-3 trigger rules and everyone will be happy. I guess I wanted to see here how many people are in favor of this feature and what trigger rules are missing.

Given the time passed and sine no further requests were made in this area (also considering from what we saw in the past) I think we should close this feature request as won't fix. It seems that we don't really need to provide a custom trigger rule logic but just add specific trigger rules to Airflow core. What do others think?

@potiuk
Copy link
Member

potiuk commented Jan 4, 2022

Agree with @eladkal

@malthe
Copy link
Contributor

malthe commented Jan 4, 2022

I think the boolean combination idea is a good one – and probably @potiuk is right that throwing time in the mix is a bad one.

That said, if it ain't broke don't fix it and while boolean combinations are arguably more elegant, if there's already a system in place that works, then I suppose that is good enough.

@eladkal
Copy link
Contributor

eladkal commented Jan 4, 2022

So I'm closing this issue for now.
If needed we can revisit in the future.

@eladkal eladkal closed this as completed Jan 4, 2022
@repl-chris
Copy link
Contributor

Is there an appropriate place to request specific additional rules? The one which I would really like is all_done_min_one_success.

@raphaelauv
Copy link
Contributor

There is 'none_failed_min_one_success'

@msklc
Copy link

msklc commented May 30, 2022

Is there an appropriate place to request specific additional rules? The one which I would really like is all_done_min_one_success.

There isnt an option like this in airflow v2!

you can select one of these rules' option:

  • 'none_failed_or_skipped'
  • 'one_failed'
  • 'none_failed'
  • 'all_success' >>> DEFAULT
  • 'always'
  • 'none_skipped'
  • 'none_failed_min_one_success'
  • 'one_success'
  • 'all_failed'
  • 'dummy'
  • 'all_done'

@potiuk
Copy link
Member

potiuk commented Jun 4, 2022

Is there an appropriate place to request specific additional rules? The one which I would really like is all_done_min_one_success.

Yes. You can actually make a PR adding it. Airflow is created by more than 2000 contributors like you. If you think and are convinced that a new rule is needed - there is nothing preventing you from opening a PR with proposing it @repl-chris

Yoy might not understand that this is a free software, developed mostly by volunteers so there is no "request" place. The easiest (and fastest) way to add a feature is to do it yourself as PR. The second best is to convince someone that your cas eis needed - ideally you need to engage with the commuity, show the need, prove it and then ask if somoene would love to implement it - it works best if there are people who agree with you.

if you approach it in the "request" way (like if you actually paid for the software - which you did not}) - you will not get far. You can also pay (or your company can pay) someone to contirbute it for you if you do not feel competent enough.

So you have plenty of options here. Most of them require more than "raising a request" and "expecting it to happen. Most of them require investing your time or money if you want something delivered by vlolunteers.

Just wanted to set expectations here, since you were - probably without the knowledge how things work here - downvoting good answers. You will probably downvote this answer too, but if you will, it means that you completely misunderstood how free software works.

@repl-chris
Copy link
Contributor

repl-chris commented Jun 5, 2022

@potiuk My apologies for using the "request" terminology - I intended to be asking if there was an active issue/discussion somewhere where the specific "2-3 missing trigger rules" eladcal mentioned were being planned, which I could take part in. I did not mean to imply that I would not be willing to do the work myself...I realize my question was not worded appropriately and my intent was completely lost, I will be more careful in the future. I do understand this is free/opensource software maintained by volunteers (myself included, as recently as v2.3.1).

@potiuk
Copy link
Member

potiuk commented Jun 5, 2022

Sorry for my "harsh" words too. It's just too often people approach free software as somethig that is "magically" maintained and that they might "demand" and "expect" things to happen.

Sometimes the written communication is very difficult to pass your real intentions and attitude :)

And in this case - really PR is the best way to attempt new rule creation. Having PR opened, would also give you (or creator of the PR whoever that person is) an opportunity to see how the rules are implement and understand it better and sometimes it might end up with ... not creating the PR at all after realising that either you can do it differently or that there are some performance/complexity implications that are difficult to reason about without writing the code.

It just a little investment. of time to make further dicusssions more productive. This is great example when the code to implement and analysis how it works is not a huge effort and even if the code resulting from it would be dumped eventually it's a good learning and best use of time for everyone.

@stdout-zhou
Copy link

Agree that it will be very helpful if one can define custom trigger_rules.

@ElisaWChen
Copy link

the trigger rule none_failed_or_skipped is a solution to your use case -> https://airflow.apache.org/docs/apache-airflow/2.1.2/concepts/dags.html?highlight=none_failed_or_skipped#trigger-rules

task_at_least_one_previous_succeed = SomeOperator(
    trigger_rule = "none_failed_or_skipped",
    ...
)

[task_a,task_b,task_c] >> task_at_least_one_previous_succeed

In case anyone finds this thread in the future like myself, I'm using Airflow 2.4.3 and I tested 'none_failed_or_skipped' and it did not in fact have the behavior that this thread seemed to imply. My downstream task did wait for all of its upstream tasks to finish, but when one of them failed, it failed everything downstream. Looking at the name of the trigger rule, that seems obvious that it implies none can fail, but just saying - that the none_failed_or_skipped does not serve the functionality of all parent/upstream tasks having finished executing, and at least one of them has succeeded. As soon as one of the upstream tasks fails, all of its downstream tasks fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests