Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why ETCD new leader will restart the TTL counting when leader swithed #13294

Closed
jiapeish opened this issue Aug 16, 2021 · 11 comments
Closed

Why ETCD new leader will restart the TTL counting when leader swithed #13294

jiapeish opened this issue Aug 16, 2021 · 11 comments

Comments

@jiapeish
Copy link

jiapeish commented Aug 16, 2021

Issue Overview
I noticed that etcd will restart the TTL counting when leader swithed.

Steps to reproduce

  1. Start a etcd client and put a key with lease, like TTL = 10s;
  2. When the key's remaining ttl becomes 4s, restart etcd leader node;
  3. The key's remaining ttl becomes to 10s again.

Why not the TTL continue counting down after new leader is elected? And how can we modify the code to fix it?

@jiapeish jiapeish changed the title ETCD new leader will restart the TTL counting when leader swithed Why ETCD new leader will restart the TTL counting when leader swithed Aug 16, 2021
@serathius
Copy link
Member

serathius commented Aug 16, 2021

This is problem in distributed systems, you cannot trust the time, each instance has different local time. A workaround for this is using time difference (TTL) instead of deadline. Etcd persists TTL and counts it from last leader change. If leader changes TTL will be preserved, but deadline will change. Thus TTL for leases is not the exact, but the minimal time for lease to be present.

There is and experimental feature called lease checkpointing that should help with this (checkpoints TTL every 5 minutes). Proper solution would require Etcd members to agree on time, which would require more work. Contributions are welcomed!

@jiapeish
Copy link
Author

This is problem in distributed systems, you cannot trust the time, each instance has different local time. A workaround for this is using time difference (TTL) instead of deadline. Etcd persists TTL and counts it from last leader change. If leader changes TTL will be preserved, but deadline will change. Thus TTL for leases is not the exact, but the minimal time for lease to be present.

There is and experimental feature called lease checkpointing that should help with this (checkpoints TTL every 5 minutes). Proper solution would require Etcd members to agree on time, which would require more work. Contributions are welcomed!

Thank you @serathius, we'll turn the checkpointing feature on and have a test.

As you mentioned, the TTL is checkpointed every 5 minutes, but I didn't find a loop/timer that do this checkpointing work periodically. How does etcd control the checkpoint period? Does it means that ETCD leader node will do this checkpoint and sends the remaining TTL to follower nodes every 5 minutes, so that follower will continue counting down?

@ardaguclu
Copy link
Contributor

@jiapeish, This pushes requests to the heap per checkpointinterval

func (le *lessor) scheduleCheckpointIfNeeded(lease *Lease) {
(checkpointInterval is customizable with 5 minutes default value and being used in here).

For consumer side, This runs forever

le.checkpointScheduledLeases()
. Basically, this
remainingTTL := int64(math.Ceil(l.expiry.Sub(now).Seconds()))
updates remainingTTL.

If you wonder why this does not work when not enabling experimental-enable-lease-checkpoint. Because this

func (l *Lease) RemainingTTL() int64 {
always returns ttl instead remainingTTL, since there is no scheduler to set this value.

@serathius I have one question. While I was investigating the issue, I realized this

if remainingTTL >= l.ttl {
condition. Is there any possibility that this can be true?

@busgo
Copy link

busgo commented Aug 22, 2021

This is problem in distributed systems, you cannot trust the time, each instance has different local time. A workaround for this is using time difference (TTL) instead of deadline. Etcd persists TTL and counts it from last leader change. If leader changes TTL will be preserved, but deadline will change. Thus TTL for leases is not the exact, but the minimal time for lease to be present.
There is and experimental feature called lease checkpointing that should help with this (checkpoints TTL every 5 minutes). Proper solution would require Etcd members to agree on time, which would require more work. Contributions are welcomed!

Thank you @serathius, we'll turn the checkpointing feature on and have a test.

As you mentioned, the TTL is checkpointed every 5 minutes, but I didn't find a loop/timer that do this checkpointing work periodically. How does etcd control the checkpoint period? Does it means that ETCD leader node will do this checkpoint and sends the remaining TTL to follower nodes every 5 minutes, so that follower will continue counting down?

You can check whether there is a leader at regular time. Code like this

// start elect
func (n *PinkNode) electLoop() {
	ticker := time.Tick(time.Second * time.Duration(n.electionTTL))
	log.Printf("the pink node instance %s start elect loop....", n.id)
	for {
		select {
		case <-ticker:
			if n.electionState == ElectionReadyState {
				n.electionState = ElectionDoingState
				log.Printf("the pink node instance %s start try elect loop....", n.id)
				n.tryElect()
			}

		}
	}
}

// try  elect
func (n *PinkNode) tryElect() {

	defer func() {
		n.electionState = ElectionReadyState
	}()
	ctx, _ := context.WithTimeout(context.TODO(), time.Second*3)
	id, err := n.etcdCli.Leader(ctx, n.electionPath)
	if err == nil {
		log.Printf("the pink node instance %s has leader is %s", n.id, id)
		if id == n.id {
			n.NotifyState(protocol.Leader)
		} else {
			n.NotifyState(protocol.Follower)
		}
		return
	}
	n.NotifyState(protocol.Follower)
	log.Printf("the pink node instance %s find leader fail:%+v", n.id, err)
	if !errors.Is(err, concurrency.ErrElectionNoLeader) {
		log.Printf("the pink node %s get leader fail:%+v", n.id, err)
		return
	}

	log.Printf("the pink node instance %s start campaign  leader", n.id)
	err = n.etcdCli.Campaign(ctx, n.id, n.electionPath, n.electionTTL)
	if err == nil {
		log.Printf("the pink node instance %s campaign  leader success", n.id)
		n.NotifyState(protocol.Leader)
		return
	}

}

@jiapeish
Copy link
Author

It's nice of you to explain this clearly @ardaguclu . It looks that I need to have a read of the lessor code...

@jiapeish
Copy link
Author

Hey, thank you @busgo , I've just find the checkpoint feature and maybe it will help.

@stale
Copy link

stale bot commented Nov 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 20, 2021
@serathius
Copy link
Member

Looks like we lease checkpointing was not correctly implemented, proposed fix #13491

@stale stale bot removed the stale label Nov 22, 2021
@jiapeish
Copy link
Author

jiapeish commented Jan 10, 2022

Looks like we lease checkpointing was not correctly implemented, proposed fix #13491

This fix is really import to me, I was wondering why the leasechecking not working after leader changed...

@serathius
Copy link
Member

serathius commented Jan 10, 2022

Looks like we lease checkpointing was not correctly implemented, proposed fix #13491

This fix is really import to me, I was wondering why the leasechecking not working after leader changed...

Please read the top comment in #13491

@serathius
Copy link
Member

Closing as issue was resolved and fix will be released in v3.5.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants