-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExternalTaskSensor respects soft_fail if the external task enters a failed_state #23647
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Looks reasonable to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool, however this is a change in behaviour which should be documented. It's not a breaking change I think, I'd rather lean towards a bugfix (It should have behaved like that already). I think though this one is enough of a change to warrant a newsfragment (of bugfix type) https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#step-4-prepare-pr
Can you please add it (and also add information about this behaviour in the docstring of the External Task Sensor @argibbs ?
@jedcunningham @ephraimbuddy - this is not a "critical enough" to make into 2.3.1 so I'd leave it up to you whether to cherry-pick / add it to 2.3.1 (though it should be small and localized enought to not be a problem to cherry-pick if we alll agree this is a "bugfix".
Sorry, missed the comment, will crack on with the requested changes ASAP. |
Alright, sorry for the delay, took me a while to get a working test env; breeze + docker worked well once I actually followed the instructions! @potiuk those requested changes are all done.
Additionally, I've rebased onto latest main... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool! I love all the extra documentation in the operator too; very helpful. Some nits and suggestions so other static checks are happier before releasing all of the other ones in CI. And it's a good idea to run the static checks locally too if you can.
Thanks for the feedback; sorry should have used breeze to run the static checks locally; should have RTM properly. |
Ok, I've run black again, weirdly black via breeze seems to do something different to black run externally 🤷 . Anyhoo, |
Likely whatever you used to run black does not take pyproject.toml into account:
That's why |
Yeah, I ran it from the root of the checkout... I dunno what I did wrong. I'm really hoping this is it now though. 😄 It's been a learning curve but I'll hopefully walk into fewer rakes if (when?) I pick up another PR. |
Could also be a different black version :). Also I really recommend "pre-commit install" and then you would not even have to think about running ANYTHING - all that is needed will be run automatically when you do "git commit". |
Tests are failing :) |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Yeah, that's my fault. I provided an incorrect suggestion. @argibbs the state values should be |
OK, <counts on fingers> 4th time's the charm |
Just checking but I assume the helm tests timing out are not a blocker? Do they need to be rerun, or do we just not mind given the unrelated nature of the change, or <insert option C here>? |
@potiuk @josh-fell just checking I don't have to do anything here... I see there are some fixes for flakey tests hitting master; do you need me to rebase? |
It's always good idea no matter what/ |
Done. I'm not at my laptop, so have used github's web ui to rebase to main ... I think it's done the right thing, o frabjous day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending "fun with tests". Great addition!
Ok so I'm at a loss now; the failing tests are Helm k8s stuffs, unrelated to my changes.
I've rebased to latest master. Do it get merged anyway, or does this PR just sit in limbo? Is there anything I can do to get it over the line? |
Still in a holding pattern. There is work being done to investigate these test failures and resolve. Don't worry, it's not you 🙂 |
Fair beans, ty for the info. What's the best place to track the progress of the fixes? Should I just lurk on the devel slack for a Eureka post, or is there an issue or something I can watch? (Don't want to be rebasing and asking for reruns unnecessarily) |
Just rebase often and early. No problem with that. You have ONE PR to take care about the maintainers about 30-50 things to look at every day (see my talk here https://www.youtube.com/watch?v=G6VjYvKr2wQ&t=1s - to get the right perspective. Be persistent. rebase when you feel like it and see your PR is green. It's you who want to merge it, and just make sure things are gree. I rebase it now - but I leave it up to you if it is fixed and green (you can continue rebasing when you see it's not}. Ping us please if you see it's green and ready for review. |
Yeah, ok. I've no problem rebasing daily, but unless something's changed, as a first time committer I don't have the ability to run the workflows. I was trying to avoid being "that guy" who pops up every day asking about a low prio MR, precisely because you guys have so much to deal with. 😄 Will see how it goes. Ty! |
The thing is that if nothing changes, we are not notified about your new push with fixes. Probably every day is too often (You need to consider people\s time availabiliyt etc. ). Apache Software Foundation has this very good rule of giving people minimum 72 Hours for important stuff - like voting and participating in discussions, this is the right baseline. And of course everyone's mileage vary. The important thing is - that if you are fresh committer to GitHub and the project - any push of yours will require from us to approve your workflow (this is security protection against bots mining bitcoins by pull requests) so until you get your first PR merged, you need to ping in the PR to approve the workflow. |
Yup, all makes sense (esp the backdoor bitcoin mining thing sadly; this is why we can't have nice things). Love the app, trying to do my part and not get in the way 😄 I'll check in every +/-3 days (unless this latest attempt works!) Thanks for the help! |
- Removing some black-induced-trailing-commas - Using explicit enum values rather than strings for state - Joining two strings black pulled up onto the same line Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com>
@potiuk @josh-fell I've rebased onto master, and resolved a minor conflict against main arising from tweaks to the file imports. If you could approve the workflow, I'd like to roll the dice again 😄 |
Static checks passing locally now. |
@potiuk @josh-fell w00t, it's passed. Can either of you merge it for me? I (understandably) don't have write access. |
Awesome work, congrats on your first merged pull request! |
🎉 🎉 🎉 🎉 🎉 |
Too bad you became 2098th contributor... it was close though .... |
Fantastic, thanks for all the help! |
…ailed_state (#23647) * Respecting soft_fail in ExternalTaskSensor when the upstream tasks are in the failed state (#19754) - Changed behaviour of sensor to as above to respect soft_fail - Added tests of new soft_fail behaviour (#19754) - Added newsfragment and improved sensor docstring (cherry picked from commit 1b34598)
…ailed_state (#23647) * Respecting soft_fail in ExternalTaskSensor when the upstream tasks are in the failed state (#19754) - Changed behaviour of sensor to as above to respect soft_fail - Added tests of new soft_fail behaviour (#19754) - Added newsfragment and improved sensor docstring (cherry picked from commit 1b34598)
Problem
You have an external task which may "skip". When that happens, you want the external task sensor to also skip. This PR enables that behaviour.
Solution
The "right" way to do this is:
failed_states
to[State.SKIPPED]
(by default it isNone
)soft_fail=True
With the above settings, when the target task enters the SKIPPED state, the ExternalTaskSensor will see this as a failed state and immediately react to the failure. Because of the
soft_fail
flag, it should not mark as failed, but instead as skipped.Effect of change
Prior to this change, the
ExternalTaskSensor
ignored thesoft_fail
setting, and always marked it as an error. With this change in place, theExternalTaskSensor
will instead mark the task as skipped, as desired.Other comments
soft_fail
would already kick in if the ExternalTaskSensors timed out (i.e. if the external task did not enter an allowed_state or failed_state in the time. That means a possible workaround for the above is to set a timeout on the ExternalTaskSensor which you believe is long enough that the external task should have completed.However (in my experience at least) ExternalTaskSensors are often run without timeouts. This is because if the target task hangs on upstream dependencies, having the external sensor timeout means cascading failures throughout the system which normally means more cleanup. It's usually easier to have the ExternalTaskSensors not timeout and thus only fail if there's an actual problem.
Additionally, soft_fail + execution timeouts is not perfect. If the task skips you may want the skip to propagate immediately; under the workaround, the skip would only kick in after the timeout. And of course, there are obvious goldilocks issues with finessing the timeout to be not too short nor too long.
Related Issues
closes: #19754