Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancelled heartbeat causes release failures #36

Closed
Yu-Xie opened this issue Feb 18, 2023 · 4 comments
Closed

Cancelled heartbeat causes release failures #36

Yu-Xie opened this issue Feb 18, 2023 · 4 comments

Comments

@Yu-Xie
Copy link

Yu-Xie commented Feb 18, 2023

Here in Do() func we are doing this:

defer l.Close()
defer l.heartbeatCancel()

Due to LIFO, we would cancel heartbeat before releasing the lock. If the lock is heartbeating when ctx is cancelled, the lock might be marked as as Released here. This will cause that when we call l.Close() shortly after, we will hit this branch and skip actually releasing the lock.

My proposal is to revert the ordering of these 2 defer operations like this so that we release the lock before cancelling the heartbeats.

defer l.heartbeatCancel()
defer l.Close()

Thoughts?

@ucirello
Copy link
Collaborator

Can you provide a repro case proving your point? I think I need a clear example to understand your point.

@ucirello
Copy link
Collaborator

I could not come up with a repro case, but I followed your reasoning and agree with your suggestion. Thank you. Code updated.

@Yu-Xie
Copy link
Author

Yu-Xie commented Feb 20, 2023

package main

import (
	"context"
	"database/sql"
	"fmt"
	"log"
	"os"
	"time"

	"cirello.io/pglock"
)

func main() {
	db, err := sql.Open("postgres", fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=%s sslmode=disable",
		"localhost", 5432, "user", "password", "db-name"))
	if err != nil {
		panic(err)
	}
	client, err := pglock.New(db,
		pglock.WithHeartbeatFrequency(time.Millisecond*5),
		pglock.WithLeaseDuration(time.Second),
		pglock.WithLogger(log.New(os.Stdout, "", log.Lshortfile)))
	if err != nil {
		panic(err)
	}
	ctx, cancel := context.WithCancel(context.Background())
	time.AfterFunc(time.Millisecond*500, cancel)
	err = client.Do(ctx, "lock-id", doSomething)
	if err != nil {
		fmt.Println(err)
	}
}

func doSomething(ctx context.Context, _ *pglock.Lock) error {
	fmt.Println("Doing something forever until ctx cancelled")
	select {
	case <-ctx.Done():
		fmt.Println("Ctx cancelled so exit")
		return nil
	}
}

What I got sometimes by executing the above binary (can't repro 100% because certain race condition needs to happen but it'll be triggered after it's run a few times):

client.go:214: storeAcquire in lock-id 5588 [] 0
client.go:216: storeAcquire out lock-id 5588 [] 5588
Doing something forever until ctx cancelled
client.go:346: heartbeat started lock-id
Ctx cancelled so exit
client.go:353: heartbeat missed cannot send heartbeat (lock-id): context canceled
client.go:353: heartbeat stopped lock-id

When you see client.go:353: heartbeat missed cannot send heartbeat (lock-id): context canceled it means the lock will not be released.

It looks like you fixed already though, thanks!

@ucirello
Copy link
Collaborator

Thanks @Yu-Xie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants