Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove build id from sticky recordWorkflowTaskStarted #6096

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

ShahabT
Copy link
Collaborator

@ShahabT ShahabT commented Jun 7, 2024

What changed?

  • Remove build id from RecordWorkflowTaskStarted calls for sticky queue tasks.

Why?

  • When old versioning is used, delay in UserData propagation to sticky queue may make the sticky WFT invalid when it wants to start. this will result in delay in workflow executions because the sticky WFT will be dropped and WF will be blocked until a new normal task gets scheduled (5secs). Note that this is still much better that the workflow getting stuck which happens in 1.24.0 and 1.24.1 fixed by Remove build id from sticky recordWorkflowTaskStarted #6095.

How did you test it?

Unit tests will be added.

Potential risks

None.

Documentation

None.

Is hotfix candidate?

No.

@ShahabT ShahabT requested a review from a team as a code owner June 7, 2024 20:58
@ShahabT ShahabT requested a review from dnr June 7, 2024 20:58
@@ -283,7 +283,12 @@ func (pm *taskQueuePartitionManagerImpl) PollTask(
if pm.partition.Kind() == enumspb.TASK_QUEUE_KIND_STICKY {
// In the sticky case we always use the unversioned queue
// For the old API, we may kick off this worker if there's a newer one.
versionSetUsed, err = checkVersionForStickyPoll(versioningData, pollMetadata.workerVersionCapabilities)
err = checkVersionForStickyPoll(versioningData, pollMetadata.workerVersionCapabilities)
// We set versionSetUsed to true for all sticky tasks until old versioning is cleaned up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this approach seems a little hacky. It seems like it would be clearer if we determined "using version sets" based on whether (any) version sets were present in user data. Then this bug would only happen on transition from unversioned to using version sets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not sure which one is more hacky. Regardless of this issue of mixing old and new versioning, one can argue that not populate stamp for sticky queue is a good thing because there is no new information in it. Although I'm not sure we want to do that, for the sake of consistency with other WFT started events.

Maybe if we change the versionSetUsed variable to something like forwardBuildId it'd look less hacky?

Anyway, IMHO a hacky solution is ok in this case as all of this will be deleted soonish.

@ShahabT ShahabT force-pushed the shahab/old-versioning-sticky branch from 6f6502a to e1382a7 Compare June 11, 2024 17:48
@ShahabT
Copy link
Collaborator Author

ShahabT commented Jun 11, 2024

I closed the other PR and reduced the changes in this PR to only include the part that we want to get into 1.24. Once this is merged I'll cherry-pick it in 1.24 branch.

@dnr
Copy link
Member

dnr commented Jun 12, 2024

Once this is merged I'll cherry-pick it in 1.24 branch.

Just add the label, the person doing the release does the cherry-pick

@ShahabT ShahabT merged commit 87ef67e into main Jun 13, 2024
42 checks passed
@ShahabT ShahabT deleted the shahab/old-versioning-sticky branch June 13, 2024 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants