-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ExecuteMultiOperation #367
Conversation
d89ae69
to
ce65a44
Compare
a0bfae2
to
da6a6cb
Compare
3747c49
to
c8e14b5
Compare
1ea586e
to
23d2d6e
Compare
|
||
message WorkflowOperationResult { | ||
oneof result { | ||
google.rpc.Status error = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think multi-operation failure should be a gRPC error.
I can't imagine a continue-on-error future (or even a non-transactional future) where this has value reported as a successful gRPC result with a failure embedded. I think for now we should consider multi-operation transactional (i.e. all fails or all succeeds) as that's where it has most value, and use gRPC errors to report failure. If we need to be more detailed about the error for the user, we can add error details which can include partial successes if we ever have that (I hope not, it's confusing and has limited value). Even if we did have the concept of partial completion at some point in the future, I think it'd be up to request options on whether error should be ignored, but it's important IMO to let gRPC failures be gRPC failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've alluded to it in the other comment; the issue here is that once the Update is admitted, the Workflow needs to start or the wait stage accepted/completed would never be reached. And once the Workflow has started, it cannot be undone (as of now).
We could send a gRPC failure back in that case. That would mean that the client cannot see the partial success anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update failure/rejection outcome is already on the successful update response. A gRPC failure would be failure to admit an update which should also fail to start the workflow (same as failure to admit a signal on signal with start).
Also, I don't believe that multi-operation update should wait until update reaches a certain state, or that a worker even needs to run. I believe update "durably admitted" (which I heard we're adding support for so update-with-start works) is when it should return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update failure/rejection outcome is already on the successful update response. A gRPC failure would be failure to admit an update which should also fail to start the workflow (same as failure to admit a signal on signal with start).
You're right, I've come to see it the same way 👍
Also, I don't believe that multi-operation update should wait until update reaches a certain state, or that a worker even needs to run. I believe update "durably admitted" (which I heard we're adding support for so update-with-start works) is when it should return.
We are considering "durably admitted" now (approval pending) indeed.
It's my understanding that being able to wait for accepted/completed allows the client to receive the response in a single gRPC call instead of having to initiate a poll. Once we allow the user to pick wait stage "Admitted", they can choose whether to to unblock early and poll or receive everything in one go.
Or are there other concerns for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my understanding that being able to wait for accepted/completed allows the client to receive the response in a single gRPC call instead of having to initiate a poll.
This is my understanding too, I just think a supported wait-stage should be admitted with this call. One of the great aspects of Temporal is that when I submit something durably I know it will run, I don't have to wait for it to run. We may not be offering this for update, but since we're offering it for update-with-start, we should allow users to return without a worker running (i.e. basically immediately once made durable), same as signal-with-start. But I also support the ability to wait longer for update response for sure.
@@ -107,6 +107,16 @@ service WorkflowService { | |||
}; | |||
} | |||
|
|||
// ExecuteMultiOperation executes multiple operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify the transactional expectations of this call? Basically guarantee it all fails or all succeeds atomically/transactionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I took a stab at it. I'm not quite sure yet how to express the subtleties in error behavior: when the Update fails to be admitted, there actually won't be a new Workflow (akin to a transaction). However, if the Update doesn't reach Accepted/Completed, only the Update operation will fail but not the Start. So it actually has limited transactional guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if the Update doesn't reach Accepted/Completed, only the Update operation will fail but not the Start. So it actually has limited transactional guarantees.
This depends on how you define fail. It won't fail from a gRPC perspective. I don't consider an update being rejected as an update "failing" from the POV of this API. If an update doesn't reach accepted/completed, the update response is capable of representing that. Still, that the start and update were accepted by the server and the update reached a state of durability, then the gRPC call succeeded, even if a single update had an application-level failure (not to be confused with server/gRPC failure).
Also, I don't believe that multi-operation update should wait until update reaches a certain state, or that a worker even needs to run. I believe update "durably admitted" (which I heard we're adding support for so update-with-start works) is when it should return.
29ef7b9
to
49c91ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. May want to clarify idempotency behavior (i.e. that users should set start workflow request ID since they may repeatedly invoke this w/ fail-if-existing), but don't need to necessarily.
(also, I disagree that multi operation should ever need to be speculative/lossy as that is very confusing for users, but this was already decided internally)
a1b7237
to
6a8bae2
Compare
@cretz I'll think about this more ... I wonder if we should require a request ID 🤔 If we're not, I'll add a follow-up comment in a separate PR. |
Co-authored-by: Alex Shtin <alex@shtin.com>
Co-authored-by: Spencer Judge <sjudge@hey.com>
8df6401
to
8e08068
Compare
## What changed? <!-- Describe what has changed in this PR --> Happy-path implementation for temporalio/api#367 This is just the first version; it'll need more work (e.g. more validation and tests) before it can be production-ready. It is behind a feature flag. ## Why? <!-- Tell your future self why have you made these changes --> Putting up the review now so the [API change](temporalio/api#367) can be merged; and no feature branch is necessary anymore. ## How did you test it? <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> Net new API behind feature flag, needs more testing. ## Potential risks <!-- Assuming the worst case, what can be broken when deploying this change to production? --> It's behind a feature flag. ## Documentation <!-- Have you made sure this change doesn't falsify anything currently stated in `docs/`? If significant new behavior is added, have you described that in `docs/`? --> ## Is hotfix candidate? <!-- Is this PR a hotfix candidate or does it require a notification to be sent to the broader community? (Yes/No) --> --------- Co-authored-by: Alex Shtin <alex@shtin.com>
Adds support for Update-with-Start, using the MultiOperation API (temporalio/api#367).
## What was changed <!-- Describe what has changed in this PR --> Adds support for Update-with-Start, using the MultiOperation API (temporalio/api#367). ## Why? <!-- Tell your future self why have you made these changes --> To allow Update-with-Start feature.
What changed?
Adding a new "ExecuteMultiOperation" API that will support sending a combined Start + Update.
The first version of the Server implementation can be found here: temporalio/temporal#5577
Note that while the API has a working implementation, it might still change. It is therefore annotated with "NOTE: Experimental API".
Why?
New API for sending operations to the Server atomically.
Breaking changes