Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: data race found in TestGetOwnerRevision #4480

Closed
Tracked by #2246
CharlesCheung96 opened this issue Jan 26, 2022 · 1 comment
Closed
Tracked by #2246

Flaky test: data race found in TestGetOwnerRevision #4480

CharlesCheung96 opened this issue Jan 26, 2022 · 1 comment
Assignees
Labels
area/ticdc Issues or PRs related to TiCDC. component/test Unit tests and integration tests component.

Comments

@CharlesCheung96
Copy link
Contributor

Which jobs are flaking?

unit_test_in_verify_ci

Which test(s) are flaking?

pkg/etcd TestGetOwnerRevision

Jenkins logs or GitHub Actions link

[2022-01-25T02:45:17.902Z] === Failed
[2022-01-25T02:45:17.902Z] === FAIL: pkg/etcd TestGetOwnerRevision (11.99s)
[2022-01-25T02:45:17.902Z]     etcd_test.go:418: 
[2022-01-25T02:45:17.902Z]         	Error Trace:	etcd_test.go:418
[2022-01-25T02:45:17.902Z]         	            				asm_amd64.s:1371
[2022-01-25T02:45:17.902Z]         	Error:      	"etcdserver: request timed out" does not contain "context canceled"
[2022-01-25T02:45:17.902Z]         	Test:       	TestGetOwnerRevision
[2022-01-25T02:45:17.902Z]     etcd_test.go:418: 
[2022-01-25T02:45:17.902Z]         	Error Trace:	etcd_test.go:418
[2022-01-25T02:45:17.902Z]         	            				asm_amd64.s:1371
[2022-01-25T02:45:17.902Z]         	Error:      	"rpc error: code = Unknown desc = context deadline exceeded" does not contain "context canceled"
[2022-01-25T02:45:17.902Z]         	Test:       	TestGetOwnerRevision
[2022-01-25T02:45:17.902Z] ==================
[2022-01-25T02:45:17.902Z] WARNING: DATA RACE
[2022-01-25T02:45:17.902Z] Write at 0x00c000f03ea1 by goroutine 163:
[2022-01-25T02:45:17.902Z]   testing.(*common).FailNow()
[2022-01-25T02:45:17.902Z]       /usr/local/go/src/testing/testing.go:740 +0x4f
[2022-01-25T02:45:17.902Z]   testing.(*T).FailNow() 
[2022-01-25T02:45:17.902Z]       <autogenerated>:1 +0x44 
[2022-01-25T02:45:17.902Z]   github.com/stretchr/testify/require.Contains() 
[2022-01-25T02:45:17.902Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/require/require.go:50 +0x124 
[2022-01-25T02:45:17.902Z]   github.com/pingcap/tiflow/pkg/etcd.TestGetOwnerRevision.func1() 
[2022-01-25T02:45:17.902Z]       /home/jenkins/agent/workspace/ut-check/tiflow/pkg/etcd/etcd_test.go:418 +0x650 
[2022-01-25T02:45:17.902Z]  
[2022-01-25T02:45:17.902Z] Previous write at 0x00c000f03ea1 by goroutine 111: 
[2022-01-25T02:45:17.902Z]   testing.(*common).FailNow() 
[2022-01-25T02:45:17.902Z]       /usr/local/go/src/testing/testing.go:740 +0x4f 
[2022-01-25T02:45:17.902Z]   testing.(*T).FailNow() 
[2022-01-25T02:45:17.902Z]       <autogenerated>:1 +0x44 
[2022-01-25T02:45:17.902Z]   github.com/stretchr/testify/require.Contains() 
[2022-01-25T02:45:17.902Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/require/require.go:50 +0x124 
[2022-01-25T02:45:17.902Z]   github.com/pingcap/tiflow/pkg/etcd.TestGetOwnerRevision.func1() 
[2022-01-25T02:45:17.902Z]       /home/jenkins/agent/workspace/ut-check/tiflow/pkg/etcd/etcd_test.go:418 +0x650 
[2022-01-25T02:45:17.902Z]  
[2022-01-25T02:45:17.902Z] Goroutine 163 (running) created at: 
[2022-01-25T02:45:17.902Z]   github.com/pingcap/tiflow/pkg/etcd.TestGetOwnerRevision() 
[2022-01-25T02:45:17.902Z]       /home/jenkins/agent/workspace/ut-check/tiflow/pkg/etcd/etcd_test.go:407 +0x3b1 
[2022-01-25T02:45:17.902Z]   testing.tRunner() 
[2022-01-25T02:45:17.902Z]       /usr/local/go/src/testing/testing.go:1193 +0x202 
[2022-01-25T02:45:17.902Z]  
[2022-01-25T02:45:17.902Z] Goroutine 111 (finished) created at: 
[2022-01-25T02:45:17.903Z]   github.com/pingcap/tiflow/pkg/etcd.TestGetOwnerRevision() 
[2022-01-25T02:45:17.903Z]       /home/jenkins/agent/workspace/ut-check/tiflow/pkg/etcd/etcd_test.go:407 +0x3b1 
[2022-01-25T02:45:17.903Z]   testing.tRunner() 
[2022-01-25T02:45:17.903Z]       /usr/local/go/src/testing/testing.go:1193 +0x202 
[2022-01-25T02:45:17.903Z] ================== 
[2022-01-25T02:45:17.903Z]     etcd_test.go:418:  
[2022-01-25T02:45:17.903Z]         	Error Trace:	etcd_test.go:418 
[2022-01-25T02:45:17.903Z]         	            				asm_amd64.s:1371 
[2022-01-25T02:45:17.903Z]         	Error:      	"context deadline exceeded" does not contain "context canceled" 
[2022-01-25T02:45:17.903Z]         	Test:       	TestGetOwnerRevision 
[2022-01-25T02:45:17.903Z]     testing.go:1092: race detected during execution of test 
[2022-01-25T02:45:17.903Z]  
[2022-01-25T02:45:17.903Z] DONE 518 tests, 1 failure in 590.392s 
[2022-01-25T02:45:19.805Z] make: *** [unit_test_in_verify_ci] Error 1

Details: https://ci.pingcap.net/blue/organizations/jenkins/ut-check/detail/ut-check/61/pipeline/

Anything else we need to know

  • Does this test exist for other branches as well?

  • Has there been a high frequency of failure lately?

@CharlesCheung96 CharlesCheung96 added component/test Unit tests and integration tests component. area/ticdc Issues or PRs related to TiCDC. labels Jan 26, 2022
@CharlesCheung96 CharlesCheung96 self-assigned this Feb 11, 2022
@CharlesCheung96
Copy link
Contributor Author

This issue is caused by insufficient system resources:

  1. The embedded etcd service starts successfully and assigns a session to each client.
  2. During the client campaign operation, etcd server hangs up due to insufficient resources.
  3. The clients report an error after failure and call t.Failnow() in different goroutines at the same time, which leads to a data race.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. component/test Unit tests and integration tests component.
Projects
None yet
Development

No branches or pull requests

1 participant