workflow/child_workflow retry #885

yiminc-zz · 2018-06-23T02:29:39Z

server side workflow/child_workflow retry

samarabbas · 2018-06-26T22:44:33Z

common/persistence/dataInterfaces.go

@@ -97,6 +97,7 @@ const (
 	TaskTypeWorkflowTimeout
 	TaskTypeDeleteHistoryEvent
 	TaskTypeRetryTimer


Can we rename this to TaskTypeActivityRetryTimer

samarabbas · 2018-06-27T22:24:45Z

idl/github.com/uber/cadence/shared.thrift

@@ -349,6 +349,8 @@ struct ContinueAsNewWorkflowExecutionDecisionAttributes {
  30: optional binary input
  40: optional i32 executionStartToCloseTimeoutSeconds
  50: optional i32 taskStartToCloseTimeoutSeconds
+  60: optional i32 backoffStartIntervalInSeconds


Probably we want to also put failure reason and details on the ContinueAsNewEvent.

task created to track this #902

samarabbas · 2018-06-27T22:26:44Z

idl/github.com/uber/cadence/shared.thrift

@@ -417,6 +423,7 @@ struct WorkflowExecutionContinuedAsNewEventAttributes {
  50: optional i32 executionStartToCloseTimeoutSeconds
  60: optional i32 taskStartToCloseTimeoutSeconds
  70: optional i64 (js.type = "Long") decisionTaskCompletedEventId
+  80: optional i32 backoffStartIntervalInSeconds


You have RetryPolicy on the decision but not persisting it to the event.

It is persist in the WorkflowExecutionStartedEvent of next iteration.

mfateev · 2018-06-27T22:28:55Z

idl/github.com/uber/cadence/shared.thrift

@@ -417,6 +423,7 @@ struct WorkflowExecutionContinuedAsNewEventAttributes {
  50: optional i32 executionStartToCloseTimeoutSeconds
  60: optional i32 taskStartToCloseTimeoutSeconds
  70: optional i64 (js.type = "Long") decisionTaskCompletedEventId
+  80: optional i32 backoffStartIntervalInSeconds


How is it different from retryPolicy.initialIntervalSeconds?

This is the backoff interval to create the first schedule event for next iteration (which may not be the first time of retry). The initialInterval is the backoff interval for first retry.

samarabbas · 2018-06-27T22:44:13Z

service/history/historyBuilder.go

@@ -477,6 +479,18 @@ func (b *historyBuilder) newWorkflowExecutionStartedEvent(
 	attributes.ChildPolicy = request.ChildPolicy
 	attributes.ContinuedExecutionRunId = previousRunID
 	attributes.Identity = common.StringPtr(common.StringDefault(request.Identity))
+	attributes.RetryPolicy = request.RetryPolicy
+	attributes.Attempt = common.Int32Ptr(startRequest.GetAttempt())


We should allow to have Attempt and expiration being nil.

yes, if RetryPolicy is nil, it will just set the nil to attributes. The GetAttempt() will return 0 for nil case.

samarabbas · 2018-06-27T22:47:56Z

service/history/historyBuilder.go

+		if request.RetryPolicy != nil && request.RetryPolicy.GetExpirationIntervalInSeconds() > 0 {
+			expirationInSeconds := request.RetryPolicy.GetExpirationIntervalInSeconds()
+			deadline := time.Unix(0, historyEvent.GetTimestamp()).Add(time.Second * time.Duration(expirationInSeconds))
+			attributes.ExpirationTimestamp = common.Int64Ptr(deadline.Round(time.Millisecond).UnixNano())


It doesn't feel right to have this logic at the history builder layer. I think Expiration should either always be set by the client or history engine and passed in as the start request to history builder.

will move to upper layer.

samarabbas · 2018-06-27T22:49:02Z

service/history/historyBuilder.go

+			attributes.ExpirationTimestamp = common.Int64Ptr(deadline.Round(time.Millisecond).UnixNano())
+		}
+	} else {
+		attributes.ExpirationTimestamp = common.Int64Ptr(startRequest.GetExpirationTimestamp())


just directly set attributes.ExpirationTimestamp = startRequest.ExpirationTimestamp

I think we need to support this field not being set at all.

samarabbas · 2018-06-28T00:52:33Z

service/history/mutableStateBuilder.go

+		continueAsNew.DecisionStartToCloseTimeout = di.DecisionTimeout
+
+		if newStateBuilder.GetReplicationState() != nil {
+			newStateBuilder.UpdateReplicationStateLastEventID(sourceClusterName, startedEvent.GetVersion(), di.ScheduleID)


We need to update replication state in both cases. Regular case should use the di.ScheduleID and retry case should still set it to startedEventID.

samarabbas · 2018-06-28T01:02:21Z

service/history/timerQueueActiveProcessor.go

+
+		tBuilder := t.historyService.getTimerBuilder(&context.workflowExecution)
+		var transferTasks, timerTasks []persistence.Task
+		tranT, timerT, err := getDeleteWorkflowTasksFromShard(t.shard, domainID, tBuilder)


Can we test to make sure we are not creating 2 separate delete tasks on workflow timeout.

workflowTimeout does not go through this path.

samarabbas

Looks good.
Seems like there are a few followup items we should file as xdc work for this feature.

This reverts commit 1e0ec6c.

…cadence-workflow#910)" This reverts commit 492faf0.

…cadence-workflow#910)" This reverts commit 492faf0. Fix continueAsNew replication issue update doc for retry policy in idl remove unnecessary nil check handle WorkflowRetryTimerTask correctly on standby side

…cadence-workflow#910)" This reverts commit 492faf0. Fix continueAsNew replication issue update doc for retry policy in idl remove unnecessary nil check handle WorkflowRetryTimerTask correctly on standby side update test for cassandra tools

yiminc-zz requested a review from samarabbas June 23, 2018 02:29

yiminc-zz force-pushed the wf_retry branch 2 times, most recently from 32bdcb1 to c786a31 Compare June 26, 2018 20:26

samarabbas reviewed Jun 26, 2018

View reviewed changes

samarabbas reviewed Jun 27, 2018

View reviewed changes

mfateev reviewed Jun 27, 2018

View reviewed changes

samarabbas reviewed Jun 27, 2018

View reviewed changes

samarabbas reviewed Jun 28, 2018

View reviewed changes

samarabbas approved these changes Jun 28, 2018

View reviewed changes

yiminc-zz force-pushed the wf_retry branch from c7a6998 to c308df8 Compare June 28, 2018 04:02

workflow/child_workflow retry

1963f06

yiminc-zz force-pushed the wf_retry branch from c308df8 to 1963f06 Compare June 28, 2018 05:22

properly notify timer

712dad6

yiminc-zz merged commit 1e0ec6c into cadence-workflow:master Jun 28, 2018

wxing1292 pushed a commit that referenced this pull request Jun 28, 2018

Revert "workflow/child_workflow retry (#885)"

8265253

This reverts commit 1e0ec6c.

wxing1292 pushed a commit that referenced this pull request Jun 28, 2018

Revert "workflow/child_workflow retry (#885)"

63ddcdf

This reverts commit 1e0ec6c.

wxing1292 added a commit that referenced this pull request Jun 28, 2018

Revert "workflow/child_workflow retry (#885)" (#910)

492faf0

This reverts commit 1e0ec6c.

yiminc-zz added a commit to yiminc-zz/cadence that referenced this pull request Jun 29, 2018

Revert "Revert "workflow/child_workflow retry (cadence-workflow#885)" (…

1abdda1

…cadence-workflow#910)" This reverts commit 492faf0.

yiminc-zz mentioned this pull request Aug 22, 2018

Implement server side retry for workflow/child workflow #915

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow/child_workflow retry #885

workflow/child_workflow retry #885

yiminc-zz commented Jun 23, 2018

samarabbas Jun 26, 2018

samarabbas Jun 27, 2018

yiminc-zz Jun 28, 2018

samarabbas Jun 27, 2018

yiminc-zz Jun 28, 2018

mfateev Jun 27, 2018

yiminc-zz Jun 28, 2018

samarabbas Jun 27, 2018

yiminc-zz Jun 28, 2018

samarabbas Jun 27, 2018

yiminc-zz Jun 28, 2018

samarabbas Jun 27, 2018

yiminc-zz Jun 28, 2018

samarabbas Jun 28, 2018

yiminc-zz Jun 28, 2018

samarabbas Jun 28, 2018

yiminc-zz Jun 28, 2018

samarabbas left a comment

workflow/child_workflow retry #885

workflow/child_workflow retry #885

Conversation

yiminc-zz commented Jun 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samarabbas left a comment

Choose a reason for hiding this comment