Fix race conditions in tests #131

tiwilliam · 2014-05-06T18:04:11Z

There are certainly several reasons for this, but tests are constantly failing on Travis CI and sometimes on my local machine.

tiwilliam · 2014-05-06T19:03:28Z

My tests are now running fine on cdc59aa using my MacBook Pro. Although Travis is still failing due to leader registration taking too long time.

In tests we are using this sleep to wait for the leader:

// Wait for a leader
time.Sleep(100 * time.Millisecond)

We should try to solve this in another way.

armon · 2014-05-06T19:25:28Z

Travis is a hellish environment for our tests. The problem is that Serf + Raft both use lots of timers and rely specifically on random timing for many of their properties. This makes the timing rather unpredictable. Combined with Travis' terrible performance and scheduling issues, it makes it almost impossible that the test suite passes.

Probably we need to setup a testutil package that does things like "waitForLeader()" etc, and use that all over and make the tests more robust. I wrote all the tests on my Macbook Air, where they pass, but the tests do have implicit assumptions of my idle CPU and fast SSD.

nelhage · 2014-05-26T18:02:30Z

Tests are also failing nondeterministically on my Lenovo X220, where I also have an idle CPU and fast SSD, so I'd love to see the tests get more robust so I can submit PRs with more confidence that I'm not breaking the tests.

tiwilliam · 2014-05-26T18:06:02Z

We've come close to more stable tests, there are about five more tests to fix, all have a TODO comment. If you have time, please help out.

tiwilliam · 2014-05-26T23:04:24Z

This is now resolved, Travis pass in master.

armon · 2014-05-27T02:15:27Z

Thanks so much for all your work on this!

discordianfish · 2015-04-07T12:08:30Z

How did you came up with those numbers? I'm spinning up new consul clusters pretty often these days because I like to depend on it heavily in the future but it seems to me like there are quite some race conditions when deploying (eg. #841). Now I saw this ticket and wondering: Maybe the timings in the tests just workaround race conditions that happen in real deployments as well?

tiwilliam · 2015-04-07T18:22:29Z

@discordianfish Which numbers are you referring to? We switched to a solution where we wait for the leader until it is elected using polling and then have a timeout for failure. That replaced using estimated sleeps in tests.

discordianfish · 2015-04-07T18:26:26Z

Okay I see. I'll still experience some weird issues which look like race conditions in cluster bootstrapping. But will dig deeper and let you know (although the tests don't pass for me on my ubuntu 14.04 laptop, might open another issue for that one).

tiwilliam added the bug label May 6, 2014

tiwilliam changed the title ~~Fix broken tests~~ Fix race conditions in tests May 6, 2014

tiwilliam mentioned this issue May 8, 2014

Implement WaitForResult in tests to eliminate race conditions #136

Merged

tiwilliam closed this as completed May 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race conditions in tests #131

Fix race conditions in tests #131

tiwilliam commented May 6, 2014

tiwilliam commented May 6, 2014

armon commented May 6, 2014

nelhage commented May 26, 2014

tiwilliam commented May 26, 2014

tiwilliam commented May 26, 2014

armon commented May 27, 2014

discordianfish commented Apr 7, 2015

tiwilliam commented Apr 7, 2015

discordianfish commented Apr 7, 2015

Fix race conditions in tests #131

Fix race conditions in tests #131

Comments

tiwilliam commented May 6, 2014

tiwilliam commented May 6, 2014

armon commented May 6, 2014

nelhage commented May 26, 2014

tiwilliam commented May 26, 2014

tiwilliam commented May 26, 2014

armon commented May 27, 2014

discordianfish commented Apr 7, 2015

tiwilliam commented Apr 7, 2015

discordianfish commented Apr 7, 2015