-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race conditions in tests #131
Comments
My tests are now running fine on cdc59aa using my MacBook Pro. Although Travis is still failing due to leader registration taking too long time. In tests we are using this sleep to wait for the leader:
We should try to solve this in another way. |
Travis is a hellish environment for our tests. The problem is that Serf + Raft both use lots of timers and rely specifically on random timing for many of their properties. This makes the timing rather unpredictable. Combined with Travis' terrible performance and scheduling issues, it makes it almost impossible that the test suite passes. Probably we need to setup a testutil package that does things like "waitForLeader()" etc, and use that all over and make the tests more robust. I wrote all the tests on my Macbook Air, where they pass, but the tests do have implicit assumptions of my idle CPU and fast SSD. |
Tests are also failing nondeterministically on my Lenovo X220, where I also have an idle CPU and fast SSD, so I'd love to see the tests get more robust so I can submit PRs with more confidence that I'm not breaking the tests. |
We've come close to more stable tests, there are about five more tests to fix, all have a TODO comment. If you have time, please help out. |
This is now resolved, Travis pass in master. |
Thanks so much for all your work on this! |
How did you came up with those numbers? I'm spinning up new consul clusters pretty often these days because I like to depend on it heavily in the future but it seems to me like there are quite some race conditions when deploying (eg. #841). Now I saw this ticket and wondering: Maybe the timings in the tests just workaround race conditions that happen in real deployments as well? |
@discordianfish Which numbers are you referring to? We switched to a solution where we wait for the leader until it is elected using polling and then have a timeout for failure. That replaced using estimated sleeps in tests. |
Okay I see. I'll still experience some weird issues which look like race conditions in cluster bootstrapping. But will dig deeper and let you know (although the tests don't pass for me on my ubuntu 14.04 laptop, might open another issue for that one). |
There are certainly several reasons for this, but tests are constantly failing on Travis CI and sometimes on my local machine.
The text was updated successfully, but these errors were encountered: