Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

figure out why test suite is randomly failing #413

Closed
chadwhitacre opened this issue Dec 8, 2012 · 41 comments
Closed

figure out why test suite is randomly failing #413

chadwhitacre opened this issue Dec 8, 2012 · 41 comments

Comments

@chadwhitacre
Copy link
Contributor

No description provided.

@chadwhitacre
Copy link
Contributor Author

There have been a rash of test failures at Travis in the past day:

312 https://travis-ci.org/whit537/www.gittip.com/builds/3552478 - HTTP error contacting GitHub
302 https://travis-ci.org/whit537/www.gittip.com/builds/3536557 - ???
299 https://travis-ci.org/whit537/www.gittip.com/builds/3536003 - ImportError: cannot import name get_tips_and_total
298 https://travis-ci.org/whit537/www.gittip.com/builds/3535944 - ImportError: cannot import name get_tips_and_total
297 https://travis-ci.org/whit537/www.gittip.com/builds/3535920 - ImportError: cannot import name get_tips_and_total
296 https://travis-ci.org/whit537/www.gittip.com/builds/3535831 - ImportError: No module named participant
295 https://travis-ci.org/whit537/www.gittip.com/builds/3535764 - ???

Are they related? What's going on?

@chadwhitacre
Copy link
Contributor Author

312 is a not-unanticipated consequence of including a couple integration tests that depend on GitHub and Twitter API. Adding a skip_expensive_tests option ticketed as #414.

@chadwhitacre
Copy link
Contributor Author

296 - 299 are part of the #410 clusterhuck.

@chadwhitacre
Copy link
Contributor Author

@chadwhitacre
Copy link
Contributor Author

Both are in test_is_suspicious.py. They smell like intermittent failures I'm seeing even now as part of the latest, uncommitted work I'm doing on #406. I suspect it's a database interaction bug.

@chadwhitacre
Copy link
Contributor Author

These failures come several checkins and hours apart, and the failures/errors look very similar. I believe they're not caused by any immediate code change.

@chadwhitacre
Copy link
Contributor Author

I'm getting intermittent db interaction failures with the current test suite for the Participant.absorb method I'm working on, and they appear more often than not. I'm going to treat that under this and debug it.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: using a connection pool with more than one connection means that the state of the database is not reset after one test before a subsequent test hits the database using a different connection.
Test: Constrain the connection pool to a single connection.
Result: False. I constrained the size of the connection pool by passing maxconn=1 to PostgresManager in wireup, and I confirmed that that worked to constrain the pool size by inspecting context.db.pool._pool in the failing test. The test still failed.

@chadwhitacre
Copy link
Contributor Author

Hmmm ... I inspected context.db.pool._pool after taking out the maxconn argument, and the pool still had but one connection in it. I seem to recall now that it is indeed filled dynamically.

Perhaps the failure comes from reusing the connection? Perhaps the occasional success in the current case is correlated to a second connection existing in the pool? Perhaps the failures in builds 302 and 295 above correlate with the opposite?

@chadwhitacre
Copy link
Contributor Author

More info: If I drop to pdb in the failing test, and immediately continue, the test passes. A timing bug, indeed.

@chadwhitacre
Copy link
Contributor Author

More info: Now it's failing consistently again. Does Postgres still have connection open?

@chadwhitacre
Copy link
Contributor Author

I deleted all logs so that I could capture clean logs for a case. Now it's working again but logfile hasn't been recreated.

@chadwhitacre
Copy link
Contributor Author

I've captured logfiles of working and non-working runs but an initial inspection is not revealing the issue to me.

@chadwhitacre
Copy link
Contributor Author

More info: If I'm not mistaken, it's always the same test that fails, when it fails, no matter what other tests are run with it. I'm testing a single test script with five tests.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: The test suite starts failing a definite number of test runs after it starts working again.
Test: Whenever it starts working again, count the number of times until it starts breaking.
Result: False. See below. We've seen 1 and 4 and 5 runs before failure.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: Restarting the postgres server makes it start working again.
Test: Restart the postgres server. Try again.
Result: False. I restart the server and it still fails. Repeatedly.

@chadwhitacre
Copy link
Contributor Author

More info: I've been kind of watching. I really think it's the same test that fails every time.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: The bug is in the new get_transaction method I added to postgres.py.
Test: Refactor Participant.absorb to use get_connection instead.
Result: False. It started working again, and then broke on the sixth run.

@chadwhitacre
Copy link
Contributor Author

More info: I trimmed up the tests to the failing test and the one after it. It started failing again on the fifth run.

@chadwhitacre
Copy link
Contributor Author

More info: I trimmed up the tests to the single failing test. It started failing again on the second run!

@chadwhitacre
Copy link
Contributor Author

More info: It seems to alternate between success and failure with successive runs. Still on single failing test.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: When the failing test is the only one in the test suite, it alternately succeeds and fails.
Test: Run this script and inspect the output.
Result: True! :D

.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X

@chadwhitacre
Copy link
Contributor Author

Hunch: The problem is with the way data is cleared between tests.

@chadwhitacre
Copy link
Contributor Author

More info: Oh wow. I added some print statements to the __exit__ method in gittip.testing:Context, where the tables are cleared of data. Now it's working?!

@chadwhitacre
Copy link
Contributor Author

I modified hammer.py to restart the postgres server before running the tests.

@chadwhitacre
Copy link
Contributor Author

Now I've surpressed the restart. It's failing again ...

@chadwhitacre
Copy link
Contributor Author

All tests are on. It appears to be periodic again!

......X.......X.......X.......X.......X.......X...

@chadwhitacre
Copy link
Contributor Author

I turned the restart back on, and it's still failing periodically.

....X.......X.......X.......X.......X.......X.....

It's failing every eighth test run.

@chadwhitacre
Copy link
Contributor Author

I trimmed down to one test in the suite instead of six and it just passed 50 times.

[gittip] $ ./hammer.py 50
waiting for server to shut down.... done
server stopped
server starting
..................................................
[gittip] $

@chadwhitacre
Copy link
Contributor Author

Okay, it looks like I can trigger this by toggling the test that fails in the middle of the transaction in Participant.absorb:

[gittip] $ ./hammer.py 50 -s
waiting for server to shut down.... done
server stopped
server starting
...X.......X.......X.......X.......X.......X......
[gittip] $ ./hammer.py 50
..................................................
[gittip] $ ./hammer.py 50
.X.......X.......X.......X.......X.......X.......X
[gittip] $ ./hammer.py 50
..................................................
[gittip] $ ./hammer.py 50
..................................................
[gittip] $ ./hammer.py 50
.......X.......X.......X.......X.......X.......X..
[gittip] $

@chadwhitacre
Copy link
Contributor Author

Restarting postgres seems to have no effect:

[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24 -s
Password:
waiting for server to shut down..... done
server stopped
server starting
.......X.......X.......X
[gittip] $ ./hammer.py 24 -s
waiting for server to shut down.... done
server stopped
server starting
.......X.......X.......X
[gittip] $ ./hammer.py 24 -s
waiting for server to shut down.... done
server stopped
server starting
.......X.......X.......X
[gittip] $

@chadwhitacre
Copy link
Contributor Author

[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $

@chadwhitacre
Copy link
Contributor Author

Here are the two tests that, together, trigger this bug:

def test_alice_ends_up_tipping_bob_two_dollars():
    tips = [ ('alice', 'bob', 1)
           , ('alice', 'carl', 1)
            ]
    with tip_graph(*tips) as context:
        Participant('bob').absorb('carl')
        expected = Decimal('2.00')
        actual = context.diff()['tips']['inserts'][0]['amount']
        assert actual == expected, actual

def test_absorbing_unknown_fails():
    tips = [ ('alice', 'bob', 1)
           , ('alice', 'carl', 1)
            ]
    with tip_graph(*tips) as context:
        assert_raises(IntegrityError, Participant('bob').absorb, 'jim')
        actual = context.diff()
        assert actual == {}, actual

Either of those tests by itself is fine. When they're both present, the test run fails every eighth run, and always on the two_dollars test.

(Why was it every other run above?)

@chadwhitacre
Copy link
Contributor Author

So this is good, actually. :)

I wrote the get_transaction method for this Participant.absorb method, and the unknown_fails test is in fact designed to test this kind of interaction.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: The transaction is not getting rolled back as desired for the exception case.
Test: Log the commit/rollback under get_transaction and inspect the output.
Result: False. The commits and rollbacks are firing as expected.

@chadwhitacre
Copy link
Contributor Author

Hypothesis: The sort order of context.diff() is varying.
Test: Log the output of context.diff() in the two_dollars test and inspect it.
Result: True. I printed the keys used for sorting in gittip.testing:Context._diff. They're varying somehow that's relevant.

@chadwhitacre
Copy link
Contributor Author

The errors in build 302 and 295 smell a lot like this, too.

@chadwhitacre
Copy link
Contributor Author

I'm pretty sure the periodicity is coming from Python's dict ordering, not from Postgres. I'm not sure whether Postgres sequences are absolutely guaranteed to be monotonically increasing (as this StackOverflow comment suggests), but I certainly don't expect it to vary on the eights. However, for Python dict ordering to vary in some multiple of eight sounds much more plausible, given that it's affected by byte padding and whatnot blah blah blah.

@chadwhitacre
Copy link
Contributor Author

Which is to say that simply sorting the dicts that we iterate over in Context._diff to produce the lists of inserts, updates, and deletes should be quiet sufficient for our test infrastructure.

@chadwhitacre
Copy link
Contributor Author

The __version__ failure in 302 was a legitimate bug, due to this changeset. The rest of the failures are the same as in 295. Since Context.diff is implicated in all five errors, I'm saying those errors are due to the bug uncovered here.

@chadwhitacre
Copy link
Contributor Author

[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 128
.............................................................................................
...................................
[gittip] $ ./hammer.py 256
.............................................................................................
.............................................................................................
......................................................................
[gittip] $

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant