-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Recreate activity heartbeat timeout after first timer fire #658
Conversation
@@ -303,7 +303,7 @@ func (tb *timerBuilder) loadActivityTimers(msBuilder *mutableStateBuilder) { | |||
td := &timerDetails{ | |||
TimerSequenceID: TimerSequenceID{VisibilityTimestamp: heartBeatExpiry}, | |||
ActivityID: v.ScheduleID, | |||
EventID: v.StartedID, | |||
EventID: v.ScheduleID, | |||
TimeoutType: w.TimeoutTypeHeartbeat, | |||
TimeoutSec: v.HeartbeatTimeout, | |||
TaskCreated: (v.TimerTaskStatus & TimerTaskStatusCreatedHeartbeat) != 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is after first heartbeat timer is created, the bit for heartbeat timer will be set, and after that this TaskCreated will always be true so firstActivityTimerTask() will not return the timer task and it won't be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiminc
if my understanding is correct, https://github.com/uber/cadence/blob/master/service/history/timerQueueActiveProcessor.go#L307 will take care of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually have some doubt:
here: https://github.com/uber/cadence/pull/658/files#diff-b3d25d1c01ad6e7393fd3958caea1333R287
the start event ID is actually checked
so when the timer detail is created (by loading activity into memory), we are sure that the start event ID is a valid event ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this fix works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the startedID is buffered event id which changed to real start event after it is flushed.
@@ -303,7 +303,7 @@ func (tb *timerBuilder) loadActivityTimers(msBuilder *mutableStateBuilder) { | |||
td := &timerDetails{ | |||
TimerSequenceID: TimerSequenceID{VisibilityTimestamp: heartBeatExpiry}, | |||
ActivityID: v.ScheduleID, | |||
EventID: v.StartedID, | |||
EventID: v.ScheduleID, | |||
TimeoutType: w.TimeoutTypeHeartbeat, | |||
TimeoutSec: v.HeartbeatTimeout, | |||
TaskCreated: (v.TimerTaskStatus & TimerTaskStatusCreatedHeartbeat) != 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this fix works.
) When the heartbeat timeout is created for the very first time it uses StartEventID as the dedupe event ID. But it is possible that it is a buffered event and no ID is specified when the timer task is created. This trips the recreation logic from recreating the timer. This change instead relies on the schedule ID for the event for dedupe event ID of the heartbeat timer.
) When the heartbeat timeout is created for the very first time it uses StartEventID as the dedupe event ID. But it is possible that it is a buffered event and no ID is specified when the timer task is created. This trips the recreation logic from recreating the timer. This change instead relies on the schedule ID for the event for dedupe event ID of the heartbeat timer.
) When the heartbeat timeout is created for the very first time it uses StartEventID as the dedupe event ID. But it is possible that it is a buffered event and no ID is specified when the timer task is created. This trips the recreation logic from recreating the timer. This change instead relies on the schedule ID for the event for dedupe event ID of the heartbeat timer.
* bump cadence-web to 1.1.1 * bugfix: when multiple activity got timeouted, there will be at most one being actually deleted in Cassandra (#655) * Bugfix: Recreate activity heartbeat timeout after first timer fire (#658) When the heartbeat timeout is created for the very first time it uses StartEventID as the dedupe event ID. But it is possible that it is a buffered event and no ID is specified when the timer task is created. This trips the recreation logic from recreating the timer. This change instead relies on the schedule ID for the event for dedupe event ID of the heartbeat timer. * Bump cadence web to 1.2.0
PR cadence-workflow#658 fixes an issue with heartbeat timers which require us to use activity scheduleID in the timertask. This change modifies the heartbeat timer creation check to account for incorrect heartbeat timer tasks created before the bugfix.
) PR #658 fixes an issue with heartbeat timers which require us to use activity scheduleID in the timertask. This change modifies the heartbeat timer creation check to account for incorrect heartbeat timer tasks created before the bugfix.
When the heartbeat timeout is created for the very first time it
uses StartEventID as the dedupe event ID. But it is possible that
it is a buffered event and no ID is specified when the timer task
is created. This trips the recreation logic from recreating the
timer.
This change instead relies on the schedule ID for the event
for dedupe event ID of the heartbeat timer.