Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed in testing repairs #45

Open
yujunz opened this issue Jul 17, 2020 · 3 comments · Fixed by #46
Open

Failed in testing repairs #45

yujunz opened this issue Jul 17, 2020 · 3 comments · Fixed by #46
Labels

Comments

@yujunz
Copy link
Contributor

yujunz commented Jul 17, 2020

I tried to port roshi to go 1.14 but failed in some tests related to repair

https://travis-ci.com/github/yujunz/roshi/builds/176022842#L377

=== RUN   TestAllRepairs
    repair_strategies_test.go:34: pre-repair: cluster 2: only got 0 responses
    repair_strategies_test.go:34: pre-repair: cluster 3: only got 0 responses
--- FAIL: TestAllRepairs (0.00s)
=== RUN   TestRateLimitedRepairs
    repair_strategies_test.go:87: post-repair: cluster 0: has [{Key:foo Score:2.3 Member:delta} {Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha}]
    repair_strategies_test.go:87: post-repair: cluster 1: has [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha} {Key:foo Score:1.3 Member:delta}]
    repair_strategies_test.go:87: post-repair: cluster 2: has [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha}]
    repair_strategies_test.go:92: post-repair: cluster 2: expected [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha} {Key:foo Score:1.3 Member:delta}], got [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha}]
    repair_strategies_test.go:87: post-repair: cluster 3: has [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha}]
    repair_strategies_test.go:92: post-repair: cluster 3: expected [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha} {Key:foo Score:1.3 Member:delta}], got [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha}]
    repair_strategies_test.go:87: post-repair: cluster 4: has [{Key:foo Score:2.2 Member:beta} {Key:foo Score:2.1 Member:alpha} {Key:foo Score:1.3 Member:delta}]
--- FAIL: TestRateLimitedRepairs (0.00s)
=== RUN   TestExplodingGoroutines
--- PASS: TestExplodingGoroutines (0.03s)
=== RUN   TestMakeSet
--- PASS: TestMakeSet (0.00s)
=== RUN   TestAddHas
--- PASS: TestAddHas (0.00s)
=== RUN   TestAddMany
--- PASS: TestAddMany (0.00s)
=== RUN   TestOrderedLimitedSlice
--- PASS: TestOrderedLimitedSlice (0.00s)
FAIL
FAIL	github.com/yujunz/roshi/farm	0.052s

However, running test locally did PASS. What could be the problem?

@yujunz
Copy link
Contributor Author

yujunz commented Jul 17, 2020

I repeated the test for several times and can reproduce the errors as above. There is also one crash observed

=== RUN   TestAllRepairs
fatal error: concurrent map writes

goroutine 190 [running]:
runtime.throw(0x12eae35, 0x15)
	/usr/local/Cellar/go/1.14.4/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc0001a5df8 sp=0xc0001a5dc8 pc=0x1035092
runtime.mapassign_faststr(0x12984c0, 0xc000109680, 0x12e6813, 0x3, 0xc0001eca48)
	/usr/local/Cellar/go/1.14.4/libexec/src/runtime/map_faststr.go:211 +0x3f7 fp=0xc0001a5e60 sp=0xc0001a5df8 pc=0x1014677
github.com/yujunz/roshi/farm.(*mockCluster).Insert(0xc000109650, 0xc000109830, 0x1, 0x1, 0xc00009c2a0, 0x1506b00)
	/Users/yujunz/Code/github.com/soundcloud/roshi/farm/mock_cluster_test.go:109 +0x290 fp=0xc0001a5f10 sp=0xc0001a5e60 pc=0x1250580
github.com/yujunz/roshi/farm.(*Farm).Insert.func1(0x1347d40, 0xc000109650, 0xc000109830, 0x1, 0x1, 0x152b8d0, 0x0)
	/Users/yujunz/Code/github.com/soundcloud/roshi/farm/farm.go:65 +0x4f fp=0xc0001a5f50 sp=0xc0001a5f10 pc=0x1259abf
github.com/yujunz/roshi/farm.(*Farm).write.func2(0xc000026600, 0x12f79f8, 0xc000109830, 0x1, 0x1, 0x1347d40, 0xc000109650)
	/Users/yujunz/Code/github.com/soundcloud/roshi/farm/farm.go:125 +0x62 fp=0xc0001a5fa8 sp=0xc0001a5f50 pc=0x1259c92
runtime.goexit()
	/usr/local/Cellar/go/1.14.4/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0001a5fb0 sp=0xc0001a5fa8 pc=0x1066d71
created by github.com/yujunz/roshi/farm.(*Farm).write
	/Users/yujunz/Code/github.com/soundcloud/roshi/farm/farm.go:124 +0x214

goroutine 1 [chan receive]:
testing.(*T).Run(0xc00012c900, 0x12e8bd6, 0xe, 0x12f7a78, 0x1083301)
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:1043 +0x37e
testing.runTests.func1(0xc00012c480)
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:1284 +0x78
testing.tRunner(0xc00012c480, 0xc00013fe10)
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:991 +0xdc
testing.runTests(0xc00000e380, 0x14fda80, 0x14, 0x14, 0x0)
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:1282 +0x2a7
testing.(*M).Run(0xc000148000, 0x0)
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:1199 +0x15f
main.main()
	_testmain.go:82 +0x135

goroutine 189 [runnable]:
github.com/yujunz/roshi/farm.TestAllRepairs(0xc00012c900)
	/Users/yujunz/Code/github.com/soundcloud/roshi/farm/repair_strategies_test.go:37 +0x537
testing.tRunner(0xc00012c900, 0x12f7a78)
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:991 +0xdc
created by testing.(*T).Run
	/usr/local/Cellar/go/1.14.4/libexec/src/testing/testing.go:1042 +0x357

Could it be a problem in the mocked cluster?

@yujunz
Copy link
Contributor Author

yujunz commented Jul 17, 2020

It seems caused by race condition in map. Fixed in 9698675 by adding mutex.

@matthiasr
Copy link

There is something more going on here. I enabled running the tests in GitHub actions and these tests kept failing (#49). I disabled them, merged, and then realized that I hadn't turned on the race detector (which was on before). Weirdly, with the race detector on, they pass (#50)! Locally for me, they also pass without race detection.
Maybe the tests are still timing sensitive, and fail if they run too fast?

@matthiasr matthiasr added the bug label Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants