V0.3.12 patch #828

wxing1292 · 2018-06-08T01:53:50Z

No description provided.

…patch

samarabbas · 2018-06-08T16:02:55Z

common/persistence/cassandraPersistence.go

@@ -2272,7 +2278,7 @@ func (d *cassandraPersistence) GetTasks(request *GetTasksRequest) (*GetTasksResp
 		rowTypeTask,
 		request.ReadLevel,
 		request.MaxReadLevel,
-		request.BatchSize)
+	).PageSize(request.BatchSize)


You are using paginated query but not returning pageToken back to the caller.

this is for the task, by the matching side.
for anything other than timer, we use the task id for querying

samarabbas · 2018-06-08T16:08:57Z

service/history/queueAckMgr.go

@@ -133,15 +132,21 @@ TaskFilterLoop:
 	return tasks, morePage, nil
 }

-func (a *queueAckMgrImpl) completeTask(taskID int64) {
+func (a *queueAckMgrImpl) completeQueueTask(taskID int64) error {
+	err := a.processor.completeTask(taskID)


I'm a little confused here. I thought we are going to cleanup the task as a separate background go-routine. Each individual processor only updates the acklevel for the cluster.

this will be actually called by each processor.
the replicator queue processor will actually complete the task
the transfer queue processor will do nothing

samarabbas · 2018-06-08T16:14:04Z

service/history/replicatorQueueProcessor.go

@@ -175,7 +175,7 @@ func (p *replicatorQueueProcessorImpl) readTasks(readLevel int64) ([]queueTaskIn
 		tasks[i] = response.Tasks[i]
 	}

-	return tasks, len(tasks) >= batchSize, nil
+	return tasks, len(response.NextPageToken) != 0, nil


Looks like we don't use the nextPageToken for the next query.

for anything other than timer, we use the task id for querying

samarabbas · 2018-06-08T16:18:43Z

service/history/service.go

@@ -114,8 +114,8 @@ func NewConfig(dc *dynamicconfig.Collection, numberOfShards int) *Config {
 		TimerProcessorGetFailureRetryCount:                 5,
 		TimerProcessorCompleteTimerFailureRetryCount:       10,
 		TimerProcessorUpdateShardTaskCount:                 100,
-		TimerProcessorUpdateAckInterval:                    1 * time.Minute,
-		TimerProcessorCompleteTimerInterval:                1 * time.Second,
+		TimerProcessorUpdateAckInterval:                    5 * time.Second,


Updating shard is a very expensive query. I'm not sure changing update interval for all 16K shards is a good idea.

we have the shard level min update interval to guarantee shard is not updated frequently (on the database level), however, on the application level (in memory), as frequent as possible

samarabbas · 2018-06-08T16:19:43Z

service/history/service.go

@@ -125,16 +125,16 @@ func NewConfig(dc *dynamicconfig.Collection, numberOfShards int) *Config {
 		TransferProcessorCompleteTransferFailureRetryCount: 10,
 		TransferProcessorUpdateShardTaskCount:              100,
 		TransferProcessorMaxPollInterval:                   60 * time.Second,
-		TransferProcessorUpdateAckInterval:                 1 * time.Minute,
-		TransferProcessorCompleteTransferInterval:          1 * time.Second,
+		TransferProcessorUpdateAckInterval:                 5 * time.Second,


Same comment as timer update interval.

samarabbas · 2018-06-08T16:19:54Z

service/history/service.go

 		TransferProcessorStandbyTaskDelay:                  0 * time.Minute,
 		ReplicatorTaskBatchSize:                            10,
 		ReplicatorTaskWorkerCount:                          10,
 		ReplicatorTaskMaxRetryCount:                        100,
 		ReplicatorProcessorMaxPollRPS:                      100,
 		ReplicatorProcessorUpdateShardTaskCount:            100,
 		ReplicatorProcessorMaxPollInterval:                 60 * time.Second,
-		ReplicatorProcessorUpdateAckInterval:               1 * time.Minute,
+		ReplicatorProcessorUpdateAckInterval:               5 * time.Second,


Same as above.

samarabbas · 2018-06-08T16:28:11Z

service/history/timerQueueProcessor.go

@@ -182,6 +182,7 @@ func (t *timerQueueProcessorImpl) completeTimers() error {
 		}
 	}

+	t.logger.Infof("Start completing timer task from: %v, to %v.", lowerAckLevel, upperAckLevel)


This log will be very noisy.

i can remove this

samarabbas · 2018-06-08T16:32:59Z

service/history/transferQueueProcessor.go

 			if upperAckLevel > ackLevel {
 				upperAckLevel = ackLevel
 			}
 		}
 	}

+	t.logger.Infof("Start completing transfer task from: %v, to %v.", lowerAckLevel, upperAckLevel)


This is going to be very noisy log.

i can remove this

samarabbas · 2018-06-08T16:36:29Z

service/history/transferQueueProcessorBase.go

+
+func (t *transferQueueProcessorBase) readTasks(readLevel int64) ([]queueTaskInfo, bool, error) {
+	batchSize := t.options.BatchSize
+	response, err := t.executionManager.GetTransferTasks(&persistence.GetTransferTasksRequest{


It is weird that we use paginationToken in one place and does not use it another place. I think we should make both paths consistent.

that means using the complex logic, over simple existing logic.
the reason that we have to use the pagination token is timer task.

samarabbas

Can you address the comments I posted.

Wenquan Xing added 9 commits May 3, 2018 14:08

make transfer / timer / replication persistence layer accept page token

2cda3dc

get task API on the cassandra side should use page size, not limit

becb025

apply pagination token to timer ack manager

71f1d2b

move duplicated transfer code to one place

d2c831d

modify test to force pagination

1fb3fe8

bugfix: queue ack manager should call processor's complete task fn

4b1d4a4

Merge branch 'master' into pagination

184b9c9

do not keep the lock when doing query

7557375

Merge commit '75573754ee2bc32fd210619eca239f5fc4f1e0cd' into v0.3.12-…

91e63ce

…patch

wxing1292 requested a review from samarabbas June 8, 2018 01:53

samarabbas reviewed Jun 8, 2018

View reviewed changes

samarabbas suggested changes Jun 8, 2018

View reviewed changes

samarabbas approved these changes Jun 8, 2018

View reviewed changes

wxing1292 merged commit 9a8e613 into v0.3.12-base Jun 8, 2018

wxing1292 deleted the v0.3.12-patch branch June 8, 2018 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V0.3.12 patch #828

V0.3.12 patch #828

wxing1292 commented Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas Jun 8, 2018

samarabbas Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas Jun 8, 2018

wxing1292 Jun 8, 2018

samarabbas left a comment

V0.3.12 patch #828

V0.3.12 patch #828

Conversation

wxing1292 commented Jun 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samarabbas left a comment

Choose a reason for hiding this comment