Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in tests #131

Closed
tiwilliam opened this issue May 6, 2014 · 9 comments
Closed

Fix race conditions in tests #131

tiwilliam opened this issue May 6, 2014 · 9 comments
Labels
type/bug Feature does not function as expected

Comments

@tiwilliam
Copy link
Contributor

There are certainly several reasons for this, but tests are constantly failing on Travis CI and sometimes on my local machine.

@tiwilliam tiwilliam added the bug label May 6, 2014
@tiwilliam
Copy link
Contributor Author

My tests are now running fine on cdc59aa using my MacBook Pro. Although Travis is still failing due to leader registration taking too long time.

In tests we are using this sleep to wait for the leader:

// Wait for a leader
time.Sleep(100 * time.Millisecond)

We should try to solve this in another way.

@armon
Copy link
Member

armon commented May 6, 2014

Travis is a hellish environment for our tests. The problem is that Serf + Raft both use lots of timers and rely specifically on random timing for many of their properties. This makes the timing rather unpredictable. Combined with Travis' terrible performance and scheduling issues, it makes it almost impossible that the test suite passes.

Probably we need to setup a testutil package that does things like "waitForLeader()" etc, and use that all over and make the tests more robust. I wrote all the tests on my Macbook Air, where they pass, but the tests do have implicit assumptions of my idle CPU and fast SSD.

@tiwilliam tiwilliam changed the title Fix broken tests Fix race conditions in tests May 6, 2014
@nelhage
Copy link
Contributor

nelhage commented May 26, 2014

Tests are also failing nondeterministically on my Lenovo X220, where I also have an idle CPU and fast SSD, so I'd love to see the tests get more robust so I can submit PRs with more confidence that I'm not breaking the tests.

@tiwilliam
Copy link
Contributor Author

We've come close to more stable tests, there are about five more tests to fix, all have a TODO comment. If you have time, please help out.

@tiwilliam
Copy link
Contributor Author

This is now resolved, Travis pass in master.

@armon
Copy link
Member

armon commented May 27, 2014

Thanks so much for all your work on this!

@discordianfish
Copy link
Contributor

How did you came up with those numbers? I'm spinning up new consul clusters pretty often these days because I like to depend on it heavily in the future but it seems to me like there are quite some race conditions when deploying (eg. #841). Now I saw this ticket and wondering: Maybe the timings in the tests just workaround race conditions that happen in real deployments as well?

@tiwilliam
Copy link
Contributor Author

@discordianfish Which numbers are you referring to? We switched to a solution where we wait for the leader until it is elected using polling and then have a timeout for failure. That replaced using estimated sleeps in tests.

@discordianfish
Copy link
Contributor

Okay I see. I'll still experience some weird issues which look like race conditions in cluster bootstrapping. But will dig deeper and let you know (although the tests don't pass for me on my ubuntu 14.04 laptop, might open another issue for that one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants