Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in TestRaftClusterMultipleRestart #8543

Closed
rleungx opened this issue Aug 16, 2024 · 3 comments · Fixed by #8686
Closed

Data race in TestRaftClusterMultipleRestart #8543

rleungx opened this issue Aug 16, 2024 · 3 comments · Fixed by #8686
Labels
type/ci The issue is related to CI.

Comments

@rleungx
Copy link
Member

rleungx commented Aug 16, 2024

Flaky Test

Which jobs are failing

==================
WARNING: DATA RACE
Write at 0x00c0020aa1b8 by goroutine 16487:
  github.com/tikv/pd/server/cluster.(*RaftCluster).InitCluster()
      /data/nvme0n1/ryan/workspace/pd/server/cluster/cluster.go:290 +0x19e
  github.com/tikv/pd/server/cluster.(*RaftCluster).Start()
      /data/nvme0n1/ryan/workspace/pd/server/cluster/cluster.go:318 +0x258
  github.com/tikv/pd/tests/server/cluster_test.TestRaftClusterMultipleRestart()
      /data/nvme0n1/ryan/workspace/pd/tests/server/cluster/cluster_test.go:617 +0x51b
  github.com/pingcap/failpoint.parseTerm()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:149 +0x364
  github.com/pingcap/failpoint.parse()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:126 +0xa5
  github.com/pingcap/failpoint.newTerms()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:98 +0x3e
  github.com/pingcap/failpoint.(*Failpoint).Enable()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoint.go:54 +0x3e
  github.com/pingcap/failpoint.(*Failpoints).Enable()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoints.go:105 +0x276
  github.com/pingcap/failpoint.Enable()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoints.go:225 +0x4a8
  github.com/tikv/pd/tests/server/cluster_test.TestRaftClusterMultipleRestart()
      /data/nvme0n1/ryan/workspace/pd/tests/server/cluster/cluster_test.go:615 +0x4a9
  testing.tRunner()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1648 +0x44

Previous read at 0x00c0020aa1b8 by goroutine 30359:
  github.com/tikv/pd/server/cluster.(*RaftCluster).runMinResolvedTSJob()
      /data/nvme0n1/ryan/workspace/pd/server/cluster/cluster.go:2245 +0x1e4
  github.com/tikv/pd/server/cluster.(*RaftCluster).Start.func7()
      /data/nvme0n1/ryan/workspace/pd/server/cluster/cluster.go:367 +0x33

Goroutine 16487 (running) created at:
  testing.(*T).Run()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1648 +0x82a
  testing.runTests.func1()
      /data/nvme0n1/ryan/go/src/testing/testing.go:2054 +0x84
  testing.tRunner()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1595 +0x238
  testing.runTests()
      /data/nvme0n1/ryan/go/src/testing/testing.go:2052 +0x896
  testing.(*M).Run()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1925 +0xb57
  main.main()
      _testmain.go:131 +0x2e4

Goroutine 30359 (finished) created at:
  github.com/tikv/pd/server/cluster.(*RaftCluster).Start()
      /data/nvme0n1/ryan/workspace/pd/server/cluster/cluster.go:367 +0xdaa
  github.com/tikv/pd/tests/server/cluster_test.TestRaftClusterMultipleRestart()
      /data/nvme0n1/ryan/workspace/pd/tests/server/cluster/cluster_test.go:617 +0x51b
  github.com/pingcap/failpoint.parseTerm()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:149 +0x364
  github.com/pingcap/failpoint.parse()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:126 +0xa5
  github.com/pingcap/failpoint.newTerms()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:98 +0x3e
  github.com/pingcap/failpoint.(*Failpoint).Enable()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoint.go:54 +0x3e
  github.com/pingcap/failpoint.(*Failpoints).Enable()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoints.go:105 +0x276
  github.com/pingcap/failpoint.Enable()
      /data/nvme0n1/ryan/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoints.go:225 +0x4a8
  github.com/tikv/pd/tests/server/cluster_test.TestRaftClusterMultipleRestart()
      /data/nvme0n1/ryan/workspace/pd/tests/server/cluster/cluster_test.go:615 +0x4a9
  testing.tRunner()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /data/nvme0n1/ryan/go/src/testing/testing.go:1648 +0x44
==================

CI link

local environment

Reason for failure (if possible)

Anything else

@rleungx rleungx added the type/ci The issue is related to CI. label Aug 16, 2024
@rleungx
Copy link
Member Author

rleungx commented Sep 4, 2024

Close it for now since I can not reproduce it.

@rleungx rleungx closed this as completed Sep 4, 2024
@rleungx
Copy link
Member Author

rleungx commented Oct 9, 2024

@rleungx rleungx reopened this Oct 9, 2024
@rleungx
Copy link
Member Author

rleungx commented Oct 10, 2024

The reason is when the lease expires, it will stop raft cluster. But before it finishes, the lock will be released first. So if we call Start at this moment, it will have a race issue.

ti-chi-bot bot added a commit that referenced this issue Oct 11, 2024
close #8543

Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/ci The issue is related to CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant