Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potencial deadlock in tests/server/api #7375

Closed
lhy1024 opened this issue Nov 15, 2023 · 0 comments · Fixed by #7325
Closed

potencial deadlock in tests/server/api #7375

lhy1024 opened this issue Nov 15, 2023 · 0 comments · Fixed by #7325
Labels
type/ci The issue is related to CI.

Comments

@lhy1024
Copy link
Contributor

lhy1024 commented Nov 15, 2023

Flaky Test

Which jobs are failing

2023-11-15T13:20:06.2576488Z POTENTIAL DEADLOCK:
2023-11-15T13:20:06.2576719Z Previous place where the lock was grabbed
2023-11-15T13:20:06.2576901Z goroutine 73553 lock 0xc0050e6c80
2023-11-15T13:20:06.2578683Z [2023/11/15 13:20:05.252 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-98049745-1bdc-41f8-a354-a7a96af3c41a/127.0.0.1:36017] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
2023-11-15T13:20:06.2579132Z ../../../server/cluster/cluster.go:714 cluster.(*RaftCluster).Stop { c.Lock() } <<<<<
2023-11-15T13:20:06.2579704Z ../../../server/server.go:769 server.(*Server).stopRaftCluster { s.cluster.Stop() }
2023-11-15T13:20:06.2580083Z ../../../server/server.go:1720 server.(*Server).campaignLeader { return }
2023-11-15T13:20:06.2580499Z ../../../server/server.go:1633 server.(*Server).leaderLoop { s.campaignLeader() }
2023-11-15T13:20:06.2580521Z 
2023-11-15T13:20:06.2580766Z Have been trying to lock it again for more than 30s
2023-11-15T13:20:06.2580946Z goroutine 74321 lock 0xc0050e6c80
2023-11-15T13:20:06.2581533Z ../../../server/cluster/cluster.go:2237 cluster.(*RaftCluster).checkAndUpdateMinResolvedTS { c.Lock() } <<<<<
2023-11-15T13:20:06.2582432Z ../../../server/cluster/cluster.go:2279 cluster.(*RaftCluster).runMinResolvedTSJob { if current, needPersist := c.checkAndUpdateMinResolvedTS(); needPersist { }
2023-11-15T13:20:06.2582442Z 
2023-11-15T13:20:06.2582450Z 
2023-11-15T13:20:06.2584163Z [2023/11/15 13:20:05.261 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-98049745-1bdc-41f8-a354-a7a96af3c41a/127.0.0.1:36017] [attempt=53] [error="rpc error: code = Unavailable desc = etcdserver: no leader"]
2023-11-15T13:20:06.2584368Z Here is what goroutine 73553 doing now
2023-11-15T13:20:06.2584531Z goroutine 73553 [semacquire]:
2023-11-15T13:20:06.2584727Z sync.runtime_Semacquire(0xc0016c0288?)
2023-11-15T13:20:06.2585090Z 	/opt/hostedtoolcache/go/1.21.3/x64/src/runtime/sema.go:62 +0x25
2023-11-15T13:20:06.2585261Z sync.(*WaitGroup).Wait(0xc0016c0280)
2023-11-15T13:20:06.2585648Z 	/opt/hostedtoolcache/go/1.21.3/x64/src/sync/waitgroup.go:116 +0xa5
2023-11-15T13:20:06.2586159Z github.com/tikv/pd/server/cluster.(*schedulingController).stopSchedulingJobs(0xc0016c0240)
2023-11-15T13:20:06.2586631Z 	/home/runner/work/pd/pd/server/cluster/scheduling_controller.go:82 +0x20f
2023-11-15T13:20:06.2586987Z github.com/tikv/pd/server/cluster.(*RaftCluster).Stop(0xc0050e6c80)
2023-11-15T13:20:06.2587344Z 	/home/runner/work/pd/pd/server/cluster/cluster.go:721 +0x23b
2023-11-15T13:20:06.2587694Z github.com/tikv/pd/server.(*Server).stopRaftCluster(0xc004c22a00)
2023-11-15T13:20:06.2587966Z 	/home/runner/work/pd/pd/server/server.go:769 +0x138
2023-11-15T13:20:06.2588314Z github.com/tikv/pd/server.(*Server).campaignLeader(0xc004c22a00)
2023-11-15T13:20:06.2588600Z 	/home/runner/work/pd/pd/server/server.go:1720 +0x21ff
2023-11-15T13:20:06.2588923Z github.com/tikv/pd/server.(*Server).leaderLoop(0xc004c22a00)
2023-11-15T13:20:06.2589206Z 	/home/runner/work/pd/pd/server/server.go:1633 +0x1ddd
2023-11-15T13:20:06.2589662Z created by github.com/tikv/pd/server.(*Server).startServerLoop in goroutine 72698
2023-11-15T13:20:06.2589931Z 	/home/runner/work/pd/pd/server/server.go:621 +0x1a5
2023-11-15T13:20:06.2589941Z 
2023-11-15T13:20:06.2590125Z Other goroutines holding locks:
2023-11-15T13:20:06.2590291Z goroutine 72692 lock 0xc0098e4ea0
2023-11-15T13:20:06.2590614Z ../../cluster.go:146 tests.(*TestServer).Destroy { s.Lock() } <<<<<
2023-11-15T13:20:06.2590980Z ../../cluster.go:833 tests.(*TestCluster).Destroy { err := s.Destroy() }
2023-11-15T13:20:06.2592689Z [2023/11/15 13:20:05.262 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-1877a6d6-c0c8-43a8-a4d5-596c4b87bc0e/127.0.0.1:36017] [attempt=52] [error="rpc error: code = Unavailable desc = etcdserver: no leader"]
2023-11-15T13:20:06.2593209Z api_test.go:847 api.TestSendApiWhenRestartRaftCluster { } }
2023-11-15T13:20:06.2593219Z 
2023-11-15T13:20:06.2593384Z goroutine 74303 lock 0xc00ad68380
2023-11-15T13:20:06.2595082Z [2023/11/15 13:20:05.263 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-12778841-f297-4779-a28d-46ec018441e8/127.0.0.1:36017] [attempt=54] [error="rpc error: code = Unavailable desc = etcdserver: no leader"]
2023-11-15T13:20:06.2595734Z ../../../pkg/schedule/schedulers/scheduler_controller.go:198 schedulers.(*Controller).AddScheduler { c.Lock() } <<<<<
2023-11-15T13:20:06.2597155Z ../../../pkg/schedule/coordinator.go:492 schedule.(*Coordinator).InitSchedulers { if err = c.schedulers.AddScheduler(s, schedulerCfg.Args...); err != nil && !errors.ErrorEqual(err, errs.ErrSchedulerExisted.FastGenByArgs()) { }
2023-11-15T13:20:06.2597682Z ../../../pkg/schedule/coordinator.go:401 schedule.(*Coordinator).Run { c.InitSchedulers(true) }
2023-11-15T13:20:06.2600587Z ../../../pkg/schedule/coordinator.go:372 schedule.(*Coordinator).RunUntilStop { c.Run() }
2023-11-15T13:20:06.2602297Z ../../../server/cluster/scheduling_controller.go:108 cluster.(*schedulingController).runCoordinator { sc.coordinator.RunUntilStop() }
2023-11-15T13:20:06.2602320Z 
2023-11-15T13:20:06.2602328Z 

CI link

https://github.com/tikv/pd/actions/runs/6873093317/job/18706079376?pr=7283

Reason for failure (if possible)

Anything else

@lhy1024 lhy1024 added the type/ci The issue is related to CI. label Nov 15, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 20, 2023
ref #5839, close #7375

Signed-off-by: Ryan Leung <rleungx@gmail.com>
rleungx added a commit to rleungx/pd that referenced this issue Dec 1, 2023
ref tikv#5839, close tikv#7375

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/ci The issue is related to CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant