Apply replication history events to passive cluster #643

samarabbas · 2018-03-30T02:18:19Z

Created history replicator which is invoked by the replicator for
processing of history replication tasks. It processes the history
events from the replication task and make mutable state updates for each
event. Once all events are processed it commits the entire update using
the workflowContext API used by rest of the stack.

Also mutableStateBuilder changes to apply mutable state changes using
the actual event. This required some refactoring of mutableStateBuilder
to reuse as much code as possible between replicator and rest of the
stack.

History service has new API ReplicateEvents which is called by the
replicator to apply history events.

Current this change only works for the happy case and is not guarded by
version updates on the domain.
Replicator does not support out of order processing of history
replication tasks.

wxing1292 · 2018-03-30T17:23:04Z

service/history/historyReplicator.go

+		case shared.EventTypeActivityTaskCompleted:
+			if err := msBuilder.ReplicateActivityTaskCompletedEvent(event); err != nil {
+				return err
+			}


if this section is not completed, could you add the event types which are missing here in the comments?

yiminc-zz · 2018-03-30T16:45:15Z

client/history/client.go

+	ctx context.Context,
+	request *h.ReplicateEventsRequest,
+	opts ...yarpc.CallOption) error {
+	client, err := c.getHostForRequest(*request.WorkflowExecution.WorkflowId)


use GetWorkflowId()

yiminc-zz · 2018-03-30T16:58:21Z

service/history/historyReplicator.go

+		return err
+	}
+	execution := shared.WorkflowExecution{
+		WorkflowId: request.WorkflowExecution.WorkflowId,


do you need to check request.WorkflowExecution is valid? or it is already validated? Why do you need to create a copy of the execution and not just use the request.WorkflowExecution.

This is an actual event which needs to be replicated. So no validation is needed in this case. This is one of the main differences between historyEngine and historyReplicator.
As for the workflowExecution copy, will request.WorkflowExecution is a pointer and rest of the API takes in a struct. This is inconsistency in our implementation all over the place. I don't want to change it in this PR.

but you could use *request.WorkflowExecution right?

You are right. Done.

yiminc-zz · 2018-03-30T21:51:20Z

service/history/historyReplicator.go

+
+		createWorkflow := func(isBrandNew bool, prevRunID string) (string, error) {
+			_, err = r.shard.CreateWorkflowExecution(&persistence.CreateWorkflowExecutionRequest{
+				RequestID:                   uuid.New(),


you already have a request id created at line 103, are they different? why we need 2 request id?

Good catch. Fixed. Using the same requestID now.

yiminc-zz · 2018-03-30T22:04:05Z

service/history/mutableStateBuilder.go

@@ -230,14 +229,14 @@ func (e *mutableStateBuilder) FlushBufferedEvents() error {
 	return nil
 }

-func (e *mutableStateBuilder) ApplyReplicationStateUpdates(failoverVersion int64) {
+func (e *mutableStateBuilder) ApplyReplicationStateUpdates(failoverVersion, lastEventID int64) {
 	e.replicationState.CurrentVersion = failoverVersion


where is the logic that you prevent failover from version that is lower than current version?

Could you help answer this?

This change just implements the happy path. I will be implementing that part of the replication protocol in a separate PR.

yiminc-zz · 2018-03-30T22:21:06Z

service/history/mutableStateBuilder.go

@@ -1076,7 +1112,7 @@ func (e *mutableStateBuilder) AddDecisionTaskScheduleToStartTimeoutEvent(schedul

 	event := e.hBuilder.AddDecisionTaskTimedOutEvent(scheduleEventID, 0, workflow.TimeoutTypeScheduleToStart)

-	e.DeleteDecision()


this means the client might never see the decision task with attempt==0. Maybe we need server to give additional explicit flag for client to decide if it should fail the decision with error message or it should silently not-response-and-let-it-timeout.

Filed issue: #645 to track this.

yiminc-zz · 2018-03-30T22:22:55Z

service/history/replicatorQueueProcessor.go

@@ -156,6 +156,7 @@ func (p *replicatorQueueProcessorImpl) CompleteTask(taskID int64) error {

 func (p *replicatorQueueProcessorImpl) getHistory(task *persistence.ReplicationTaskInfo) (*shared.History, error) {

+	p.logger.Warnf("Received replication task: %v", task)


should not be warning

yiminc-zz · 2018-03-30T22:27:22Z

service/worker/processor.go

@@ -178,7 +184,19 @@ func (p *replicationTaskProcessor) worker(workerWG *sync.WaitGroup) {
 						p.logger.Debugf("Recieved domain replication task %v.", task.DomainTaskAttributes)
 						err = p.domainReplicator.HandleReceivingTask(task.DomainTaskAttributes)
 					case replicator.ReplicationTaskTypeHistory:
-						p.logger.Debugf("Recieved history replication task %v.", task.HistoryTaskAttributes)
+						p.logger.Warn("Recieved history replication task %v.", task.HistoryTaskAttributes)


should not be warning

Created history replicator which is invoked by the replicator for processing of history replication tasks. It processes the history events from the replication task and make mutable state updates for each event. Once all events are processed it commits the entire update using the workflowContext API used by rest of the stack. Also mutableStateBuilder changes to apply mutable state changes using the actual event. This required some refactoring of mutableStateBuilder to reuse as much code as possible between replicator and rest of the stack. History service has new API ReplicateEvents which is called by the replicator to apply history events. Current this change only works for the happy case and is not guarded by version updates on the domain. Replicator does not support out of order processing of history replication tasks.

coveralls · 2018-03-31T01:16:18Z

Coverage decreased (-0.7%) to 62.433% when pulling 7da5f3f on samarabbas:xdc-replication-protocol into 3e3e2fc on uber:master.

samarabbas requested review from wxing1292 and yiminc-zz March 30, 2018 02:18

wxing1292 reviewed Mar 30, 2018

View reviewed changes

wxing1292 approved these changes Mar 30, 2018

View reviewed changes

yiminc-zz reviewed Mar 30, 2018

View reviewed changes

samarabbas added 3 commits March 30, 2018 17:56

fix issue with the transient decision state machine

afc11d7

addressed comments

5805589

samarabbas force-pushed the xdc-replication-protocol branch from 03ec83c to 5805589 Compare March 31, 2018 00:56

code review feedback

7da5f3f

yiminc-zz approved these changes Apr 2, 2018

View reviewed changes

samarabbas merged commit fa74c62 into cadence-workflow:master Apr 2, 2018

samarabbas deleted the xdc-replication-protocol branch April 2, 2018 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply replication history events to passive cluster #643

Apply replication history events to passive cluster #643

samarabbas commented Mar 30, 2018

wxing1292 Mar 30, 2018 •

edited

Loading

samarabbas Mar 30, 2018

yiminc-zz Mar 30, 2018

samarabbas Mar 30, 2018

yiminc-zz Mar 30, 2018

samarabbas Mar 30, 2018

yiminc-zz Mar 31, 2018

samarabbas Mar 31, 2018

yiminc-zz Mar 30, 2018

samarabbas Mar 30, 2018

yiminc-zz Mar 30, 2018

yiminc-zz Mar 31, 2018

samarabbas Mar 31, 2018

yiminc-zz Mar 30, 2018

samarabbas Mar 30, 2018

yiminc-zz Mar 30, 2018

samarabbas Mar 30, 2018

yiminc-zz Mar 30, 2018

samarabbas Mar 30, 2018

coveralls commented Mar 31, 2018 •

edited

Loading

		@@ -1076,7 +1112,7 @@ func (e *mutableStateBuilder) AddDecisionTaskScheduleToStartTimeoutEvent(schedul

		event := e.hBuilder.AddDecisionTaskTimedOutEvent(scheduleEventID, 0, workflow.TimeoutTypeScheduleToStart)

		e.DeleteDecision()

		@@ -156,6 +156,7 @@ func (p *replicatorQueueProcessorImpl) CompleteTask(taskID int64) error {

		func (p replicatorQueueProcessorImpl) getHistory(task persistence.ReplicationTaskInfo) (*shared.History, error) {

		p.logger.Warnf("Received replication task: %v", task)

Apply replication history events to passive cluster #643

Apply replication history events to passive cluster #643

Conversation

samarabbas commented Mar 30, 2018

wxing1292 Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Mar 31, 2018 • edited Loading

wxing1292 Mar 30, 2018 •

edited

Loading

coveralls commented Mar 31, 2018 •

edited

Loading