Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

worker, ha: increase keepalive TTL to 1 minute, and to 30 minutes if relay enabled #1405

Merged
merged 17 commits into from
Feb 4, 2021

Conversation

lance6716
Copy link
Collaborator

@lance6716 lance6716 commented Jan 27, 2021

What problem does this PR solve?

make keepalive more rubost

What is changed and how it works?

after this PR there're two keepalive TTL: one for relay and one for non-relay. They are 30 minutes and 1 minute seperately by default.
If relay task is assigned, change it to relay-keepalive-ttl, when no relay tasks, chage it to keepalive-ttl

Check List

Tests

  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

// create a new lease with new TTL, and overwrite original KV
cliCtx, cancel := context.WithTimeout(ctx, etcdutil.DefaultRequestTimeout)
defer cancel()
lease, err = cli.Grant(cliCtx, newTTL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The revokeLease function should be called for the old lease, like line 104.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lease will expire if not keepalive, so I think it's not needed

@lance6716 lance6716 changed the title [WIP] worker, ha: increase keepalive TTL to 1 minute, and 30 minutes if relay enabled worker, ha: increase keepalive TTL to 1 minute, and 30 minutes if relay enabled Jan 28, 2021
@lance6716 lance6716 added needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated status/PTAL This PR is ready for review. Add this label back after committing new changes labels Jan 28, 2021
@lance6716
Copy link
Collaborator Author

PTAL @lichunzhu @3pointer

@lance6716 lance6716 changed the title worker, ha: increase keepalive TTL to 1 minute, and 30 minutes if relay enabled worker, ha: increase keepalive TTL to 1 minute, and to 30 minutes if relay enabled Jan 28, 2021
@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Jan 28, 2021

Our ha integration test still passed after increasing keepalive ttl... 🤔

@lance6716
Copy link
Collaborator Author

Our ha integration test still passed after increasing keepalive ttl... 🤔

I guess it's gracefully exited (not kill -9) so revoke lease function called. will check today

@@ -588,6 +588,13 @@ func (s *Server) startWorker(cfg *config.SourceConfig) error {
return err
}
startRelay = !relayStage.IsDeleted && relayStage.Expect == pb.Stage_Running
// change keepalive TTL to 30 minutes if it's default value
// is relayStage is not running, we choose to change keepalive here instead of at relayStage switching
if s.cfg.KeepAliveTTL == defaultKeepAliveTTL {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if s.cfg.KeepAliveTTL == defaultKeepAliveTTL {
if s.cfg.KeepAliveTTL < relayEnabledKeepAliveTTL {

What if I set KeepAliveTTL to 31s? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should left a chance if user doesn't want this feature, so only increase TTL when default value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we should add a configuration named relayEnabledKeepAliveTTL. In the current way, I can't set keepAliveTTL and increase relayKeepAliveTTL at the same time.

pkg/ha/keepalive.go Show resolved Hide resolved
pkg/ha/keepalive.go Show resolved Hide resolved
pkg/ha/keepalive.go Show resolved Hide resolved
tests/incremental_mode/run.sh Outdated Show resolved Hide resolved
@lance6716 lance6716 added the needs-update-docs Should update docs after this PR is merged. Remove this label once the docs are updated label Feb 2, 2021
Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we can try to revoke the last lease when we "re-keepalive". It doesn't matter even if the job is failed.
Rest LGTM

@lance6716 lance6716 added this to the v2.0.2 milestone Feb 2, 2021
@lance6716
Copy link
Collaborator Author

revokeLease is added in 387d9e1, PTAL @zeminzhou @lichunzhu

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeminzhou
Copy link
Contributor

LGTM

@ti-srebot
Copy link

@zeminzhou, Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments. See the corresponding SIG page for more information. Related SIG: migrate(slack).

@lance6716 lance6716 added the status/LGT2 Two reviewers already commented LGTM, ready for merge label Feb 3, 2021
@lance6716
Copy link
Collaborator Author

/run-all-tests

@lance6716 lance6716 merged commit 50876d3 into pingcap:master Feb 4, 2021
ti-srebot pushed a commit to ti-srebot/dm that referenced this pull request Feb 4, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link

cherry pick to release-2.0 in PR #1426

@ti-srebot ti-srebot added already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked and removed needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 labels Feb 4, 2021
lance6716 pushed a commit that referenced this pull request Feb 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked needs-update-docs Should update docs after this PR is merged. Remove this label once the docs are updated needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated status/LGT2 Two reviewers already commented LGTM, ready for merge status/PTAL This PR is ready for review. Add this label back after committing new changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants