Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Allow users to selectively retry specific failed nodes. Fixes #12543 #12553
fix: Allow users to selectively retry specific failed nodes. Fixes #12543 #12553
Changes from all commits
fbf2ce0
0bdfc7d
1cc48c8
c60c808
57f2dfa
c9f2b1a
efd346b
42b0c80
e0f1e9c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this before, but the latter condition would be better as part of the selector if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agilgur5 Hello. Here I want to confirm whether
restartSuccessful
has the highest priority compared withselector
. If a successful node is selected, but restartSuccessful is not specified, whether the successful node is still executed. In this case, the restartSuccessful parameter doesn't feel very meaningful. Or should it be the logic of the diagram below?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diagram looks correct to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agilgur5 However, the above diagram may be a bit in conflict with the previous logic, which may cause problems with the previous e2e test. For example,
TestFormulateRetryWorkflow/Nested_DAG_with_Non-group_Node_Selected
. Do you think it's better to fix only the previous red logic in the figure below and keep the previous blue logic?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh -- based on the issue and your fix I strongly suspected there was something deeper incorrect with the previous logic. I actually asked in the contributors channel if someone more familiar with the retry logic could check this PR because of that suspicion.
Ah is that what you were doing with your initial/current fix?
I honestly think we should change it all to work as expected (the black boxes) since the previous logic feels very confusing/unexpected (and has confused users many a time before, as well as contributors).
Big thanks for drawing the diagrams here, those are super helpful! We may want to add a similar
mermaid
flowchart of this to the docs as wellThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rewrite this, we may want to release it as a breaking change similar to
retryStrategy
fixes from #11005(different retries, but similar concept that they both behaved unexpectedly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, do you have any plans to rewrite this part of the logic? Right now, our business needs to be able to retry a single error node. The changes to this issue are to minimize the impact on existing API logic and to provide the ability to retry a single faulty node. Of course, a broader refactoring might provide a more elegant solution. Do you know if the official team has any plans in this regard? 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be happening shortly per below