-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflow/child_workflow retry #885
Conversation
32bdcb1
to
c786a31
Compare
common/persistence/dataInterfaces.go
Outdated
@@ -97,6 +97,7 @@ const ( | |||
TaskTypeWorkflowTimeout | |||
TaskTypeDeleteHistoryEvent | |||
TaskTypeRetryTimer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to TaskTypeActivityRetryTimer
@@ -349,6 +349,8 @@ struct ContinueAsNewWorkflowExecutionDecisionAttributes { | |||
30: optional binary input | |||
40: optional i32 executionStartToCloseTimeoutSeconds | |||
50: optional i32 taskStartToCloseTimeoutSeconds | |||
60: optional i32 backoffStartIntervalInSeconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we want to also put failure reason and details on the ContinueAsNewEvent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task created to track this #902
@@ -417,6 +423,7 @@ struct WorkflowExecutionContinuedAsNewEventAttributes { | |||
50: optional i32 executionStartToCloseTimeoutSeconds | |||
60: optional i32 taskStartToCloseTimeoutSeconds | |||
70: optional i64 (js.type = "Long") decisionTaskCompletedEventId | |||
80: optional i32 backoffStartIntervalInSeconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have RetryPolicy on the decision but not persisting it to the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is persist in the WorkflowExecutionStartedEvent of next iteration.
@@ -417,6 +423,7 @@ struct WorkflowExecutionContinuedAsNewEventAttributes { | |||
50: optional i32 executionStartToCloseTimeoutSeconds | |||
60: optional i32 taskStartToCloseTimeoutSeconds | |||
70: optional i64 (js.type = "Long") decisionTaskCompletedEventId | |||
80: optional i32 backoffStartIntervalInSeconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it different from retryPolicy.initialIntervalSeconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the backoff interval to create the first schedule event for next iteration (which may not be the first time of retry). The initialInterval is the backoff interval for first retry.
@@ -477,6 +479,18 @@ func (b *historyBuilder) newWorkflowExecutionStartedEvent( | |||
attributes.ChildPolicy = request.ChildPolicy | |||
attributes.ContinuedExecutionRunId = previousRunID | |||
attributes.Identity = common.StringPtr(common.StringDefault(request.Identity)) | |||
attributes.RetryPolicy = request.RetryPolicy | |||
attributes.Attempt = common.Int32Ptr(startRequest.GetAttempt()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow to have Attempt and expiration being nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if RetryPolicy is nil, it will just set the nil to attributes. The GetAttempt() will return 0 for nil case.
service/history/historyBuilder.go
Outdated
if request.RetryPolicy != nil && request.RetryPolicy.GetExpirationIntervalInSeconds() > 0 { | ||
expirationInSeconds := request.RetryPolicy.GetExpirationIntervalInSeconds() | ||
deadline := time.Unix(0, historyEvent.GetTimestamp()).Add(time.Second * time.Duration(expirationInSeconds)) | ||
attributes.ExpirationTimestamp = common.Int64Ptr(deadline.Round(time.Millisecond).UnixNano()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't feel right to have this logic at the history builder layer. I think Expiration should either always be set by the client or history engine and passed in as the start request to history builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move to upper layer.
service/history/historyBuilder.go
Outdated
attributes.ExpirationTimestamp = common.Int64Ptr(deadline.Round(time.Millisecond).UnixNano()) | ||
} | ||
} else { | ||
attributes.ExpirationTimestamp = common.Int64Ptr(startRequest.GetExpirationTimestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just directly set attributes.ExpirationTimestamp = startRequest.ExpirationTimestamp
I think we need to support this field not being set at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
continueAsNew.DecisionStartToCloseTimeout = di.DecisionTimeout | ||
|
||
if newStateBuilder.GetReplicationState() != nil { | ||
newStateBuilder.UpdateReplicationStateLastEventID(sourceClusterName, startedEvent.GetVersion(), di.ScheduleID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update replication state in both cases. Regular case should use the di.ScheduleID and retry case should still set it to startedEventID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
||
tBuilder := t.historyService.getTimerBuilder(&context.workflowExecution) | ||
var transferTasks, timerTasks []persistence.Task | ||
tranT, timerT, err := getDeleteWorkflowTasksFromShard(t.shard, domainID, tBuilder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test to make sure we are not creating 2 separate delete tasks on workflow timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workflowTimeout does not go through this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Seems like there are a few followup items we should file as xdc work for this feature.
This reverts commit 1e0ec6c.
This reverts commit 1e0ec6c.
…cadence-workflow#910)" This reverts commit 492faf0. Fix continueAsNew replication issue update doc for retry policy in idl remove unnecessary nil check handle WorkflowRetryTimerTask correctly on standby side
…cadence-workflow#910)" This reverts commit 492faf0. Fix continueAsNew replication issue update doc for retry policy in idl remove unnecessary nil check handle WorkflowRetryTimerTask correctly on standby side update test for cassandra tools
server side workflow/child_workflow retry