Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Recreate activity heartbeat timeout after first timer fire #658

Merged
merged 1 commit into from
Apr 5, 2018

Conversation

samarabbas
Copy link
Contributor

@samarabbas samarabbas commented Apr 5, 2018

When the heartbeat timeout is created for the very first time it
uses StartEventID as the dedupe event ID. But it is possible that
it is a buffered event and no ID is specified when the timer task
is created. This trips the recreation logic from recreating the
timer.
This change instead relies on the schedule ID for the event
for dedupe event ID of the heartbeat timer.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 64.93% when pulling fbc8c7f on samarabbas:heartbeat-repro into 49a7202 on uber:master.

@@ -303,7 +303,7 @@ func (tb *timerBuilder) loadActivityTimers(msBuilder *mutableStateBuilder) {
td := &timerDetails{
TimerSequenceID: TimerSequenceID{VisibilityTimestamp: heartBeatExpiry},
ActivityID: v.ScheduleID,
EventID: v.StartedID,
EventID: v.ScheduleID,
TimeoutType: w.TimeoutTypeHeartbeat,
TimeoutSec: v.HeartbeatTimeout,
TaskCreated: (v.TimerTaskStatus & TimerTaskStatusCreatedHeartbeat) != 0}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is after first heartbeat timer is created, the bit for heartbeat timer will be set, and after that this TaskCreated will always be true so firstActivityTimerTask() will not return the timer task and it won't be created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually have some doubt:
here: https://github.com/uber/cadence/pull/658/files#diff-b3d25d1c01ad6e7393fd3958caea1333R287
the start event ID is actually checked
so when the timer detail is created (by loading activity into memory), we are sure that the start event ID is a valid event ID

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this fix works.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the startedID is buffered event id which changed to real start event after it is flushed.

@@ -303,7 +303,7 @@ func (tb *timerBuilder) loadActivityTimers(msBuilder *mutableStateBuilder) {
td := &timerDetails{
TimerSequenceID: TimerSequenceID{VisibilityTimestamp: heartBeatExpiry},
ActivityID: v.ScheduleID,
EventID: v.StartedID,
EventID: v.ScheduleID,
TimeoutType: w.TimeoutTypeHeartbeat,
TimeoutSec: v.HeartbeatTimeout,
TaskCreated: (v.TimerTaskStatus & TimerTaskStatusCreatedHeartbeat) != 0}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this fix works.

@samarabbas samarabbas merged commit 41fcd03 into cadence-workflow:master Apr 5, 2018
@samarabbas samarabbas deleted the heartbeat-repro branch April 5, 2018 17:26
meiliang86 pushed a commit that referenced this pull request Apr 10, 2018
)

When the heartbeat timeout is created for the very first time it
uses StartEventID as the dedupe event ID. But it is possible that
it is a buffered event and no ID is specified when the timer task
is created. This trips the recreation logic from recreating the
timer.
This change instead relies on the schedule ID for the event
for dedupe event ID of the heartbeat timer.
meiliang86 pushed a commit that referenced this pull request Apr 10, 2018
)

When the heartbeat timeout is created for the very first time it
uses StartEventID as the dedupe event ID. But it is possible that
it is a buffered event and no ID is specified when the timer task
is created. This trips the recreation logic from recreating the
timer.
This change instead relies on the schedule ID for the event
for dedupe event ID of the heartbeat timer.
nathanboktae pushed a commit that referenced this pull request Apr 12, 2018
)

When the heartbeat timeout is created for the very first time it
uses StartEventID as the dedupe event ID. But it is possible that
it is a buffered event and no ID is specified when the timer task
is created. This trips the recreation logic from recreating the
timer.
This change instead relies on the schedule ID for the event
for dedupe event ID of the heartbeat timer.
nathanboktae added a commit that referenced this pull request Apr 12, 2018
* bump cadence-web to 1.1.1

* bugfix: when multiple activity got timeouted, there will be at most one being actually deleted in Cassandra (#655)

* Bugfix: Recreate activity heartbeat timeout after first timer fire (#658)

When the heartbeat timeout is created for the very first time it
uses StartEventID as the dedupe event ID. But it is possible that
it is a buffered event and no ID is specified when the timer task
is created. This trips the recreation logic from recreating the
timer.
This change instead relies on the schedule ID for the event
for dedupe event ID of the heartbeat timer.

* Bump cadence web to 1.2.0
samarabbas added a commit to samarabbas/cadence that referenced this pull request Apr 16, 2018
PR cadence-workflow#658 fixes an issue with heartbeat timers which require us to use
activity scheduleID in the timertask.  This change modifies the
heartbeat timer creation check to account for incorrect heartbeat timer
tasks created before the bugfix.
samarabbas added a commit that referenced this pull request Apr 16, 2018
)

PR #658 fixes an issue with heartbeat timers which require us to use
activity scheduleID in the timertask.  This change modifies the
heartbeat timer creation check to account for incorrect heartbeat timer
tasks created before the bugfix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants