Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply 5 min delay for standby task #920

Merged
merged 9 commits into from
Jul 18, 2018
Merged

Apply 5 min delay for standby task #920

merged 9 commits into from
Jul 18, 2018

Conversation

wxing1292
Copy link
Contributor

@wxing1292 wxing1292 commented Jul 3, 2018

fix #914

transfer task will be delayed by at most 5 min (meaning the delay can be less than 5 min, i.e. if the standby task can be discarded, or wait until 5 min and re-check)
timer task will be delayed by 5 min.

@wxing1292 wxing1292 requested a review from samarabbas July 3, 2018 02:23
@@ -283,7 +283,13 @@ ProcessRetryLoop:
if err != nil {
if err == ErrTaskRetry {
p.metricsClient.IncCounter(p.options.MetricScope, metrics.HistoryTaskStandbyRetryCounter)
<-notificationChan
DelayLoop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we will delay the task after processing for first time. I thought the idea is to even delay on first processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for transfer task, there is no timestamp associated with it.
we can tag the queue processor, however, that require more code.

@@ -237,7 +237,7 @@ func (e *historyEngineImpl) registerDomainFailoverCallback() {
// its length > 0 and has correct timestamp, to trkgger a db scan
fakeDecisionTask := []persistence.Task{&persistence.DecisionTask{}}
fakeDecisionTimeoutTask := []persistence.Task{&persistence.DecisionTimeoutTask{VisibilityTimestamp: now}}
e.txProcessor.NotifyNewTask(e.currentClusterName, now, fakeDecisionTask)
e.txProcessor.NotifyNewTask(e.currentClusterName, fakeDecisionTask)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have the exact same contract on moving the clock on both timer and transfer processors. Timer notification takes in the currentTime as notification while transfer does not. I think we should unify these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transfer task do not have timestamp associated with it

@@ -136,6 +136,7 @@ const (
`create_request_id: ?, ` +
`decision_version: ?, ` +
`decision_schedule_id: ?, ` +
`decision_schedule_time: ?, ` +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should just have a visibility_ts on transfer task for we have the same mechanism for each kind of task for delaying them.

@@ -47,7 +47,7 @@ func NewRealTimeSource() *RealTimeSource {

// Now return the real current time
func (ts *RealTimeSource) Now() time.Time {
return time.Now()
return time.Now().UTC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually the API is UtcNow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no UtcNow() function
ref: https://golang.org/pkg/time/

@@ -1347,6 +1347,28 @@ func (_m *mockMutableState) GetHistoryEvent(serializedEvent []byte) (*shared.His
return r0, r1
}

func (_m *mockMutableState) GetInFlightDecisionTask() (*decisionInfo, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you merge my change. This was included as part of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did

…ke start child workflow

use UTC for time.Now() in history service
when failover, should unblock existing standby task
@@ -357,6 +357,7 @@ func (e *historyEngineImpl) StartWorkflowExecution(startRequest *h.StartWorkflow
}
}
setTaskVersion(msBuilder.GetCurrentVersion(), transferTasks, timerTasks)
setTransferTaskTimestamp(common.NewRealTimeSource().Now(), transferTasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have a single API for setting task information, instead of making 2 separate API calls.

@@ -177,7 +177,7 @@ func (notifier *historyEventNotifierImpl) dispatchHistoryEventNotification(event

func (notifier *historyEventNotifierImpl) enqueueHistoryEventNotification(event *historyEventNotification) {
// set the timestamp just before enqueuing the event
event.timestamp = time.Now()
event.timestamp = common.NewRealTimeSource().Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have time source as a field on historyEventNotifierImpl so another implementation could be injected.

@@ -187,12 +187,13 @@ func (c *workflowExecutionContext) updateWorkflowExecution(transferTasks []persi
c.msBuilder.GetExecutionInfo().LastFirstEventID, c.msBuilder.GetExecutionInfo().NextEventID)
}

return c.updateHelper(nil, transferTasks, timerTasks, c.createReplicationTask, "", currentVersion, transactionID)
now := common.NewRealTimeSource().Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets call the API UtcNow and have a timesource defined on workflowExecutionContext

timestamp := task.GetVisibilityTimestamp()
for {
<-notificationChan
if p.shard.GetCurrentTime(p.clusterName).After(timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to have an upper bound on how long we hold onto certain tasks like activity and decision, not delay processing always for all tasks.

@wxing1292 wxing1292 force-pushed the standby-delay branch 2 times, most recently from 7f7c270 to 440a3a5 Compare July 17, 2018 22:53
@@ -356,8 +356,7 @@ func (e *historyEngineImpl) StartWorkflowExecution(startRequest *h.StartWorkflow
replicationTasks = append(replicationTasks, replicationTask)
}
}
setTaskVersion(msBuilder.GetCurrentVersion(), transferTasks, timerTasks)
setTransferTaskTimestamp(common.NewRealTimeSource().Now(), transferTasks)
setTaskInfo(msBuilder.GetCurrentVersion(), time.Now().UTC(), transferTasks, timerTasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just get UTC within setTaskInfo?

func (ts *FakeTimeSource) Update(now time.Time) {
ts.now = now
func (ts *FakeTimeSource) Update(now time.Time) *FakeTimeSource {
ts.now = now.UTC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do UTC here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason?

@@ -835,8 +844,9 @@ func (r *historyReplicator) terminateWorkflow(ctx context.Context, domainID stri

func (r *historyReplicator) notify(clusterName string, now time.Time, transferTasks []persistence.Task,
timerTasks []persistence.Task) {
now = now.Add(-r.shard.GetConfig().StandbyClusterDelay())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no subtract?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a func (t Time) Sub(u Time) Duration which does something else

setTaskVersion(newStateBuilder.GetCurrentVersion(), newTransferTasks, nil)
setTaskInfo(
newStateBuilder.GetCurrentVersion(),
common.NewFakeTimeSource().Update(time.Unix(0, startedEvent.GetTimestamp())).Now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using fakeTimeSource in production code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ReplicateWorkflowExecutionContinuedAsNewEvent can be used by both active and standby, so the only reliable way is to use the event time in the start event

@wxing1292 wxing1292 merged commit 3a79fc0 into master Jul 18, 2018
@wxing1292 wxing1292 deleted the standby-delay branch July 18, 2018 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standby timer / transfer task are not correctly delayed.
2 participants