-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky UnpicklingError
in test_engine.py
#3149
Comments
caught with
|
error #2:
|
#3:
|
We intended to catch all subprocesses exceptions, for example the test case EngineTest.test_multiprocess_unpickleable checks SerializationError, but some later added code wasn't protected, including line #159 that caused #3149 test to hang. This review fixes this by moving everything into the same try block. This review does not fix the root cause of #3149 but next time it happens the subprocess won't die and will report the exception back to the engine. Testing Done: https://travis-ci.org/peiyuwang/pants/builds/121229553 passed. Bugs closed: 3149, 3155 Reviewed at https://rbcommons.com/s/twitter/r/3656/
After https://rbcommons.com/s/twitter/r/3656/ https://travis-ci.org/pantsbuild/pants/jobs/121272283 simply says FAILURE
|
Happening again https://travis-ci.org/pantsbuild/pants/jobs/121598230 This time we have assertion failure as expected
|
@patricklaw saw another failure from |
@peiyuwang looks like a skip for
Logs attached for posterity. I'll send up an RB now skipping `test_multiprocess_engine_multi . |
test_engine.py
UnpicklingError
in test_engine.py
So I added a catch block and print out the hexlified unpicklable data in hex, and
Observations:
I remember tried to switch to in memory mode in test and still seeing the error. I will try again just to eliminate this is not |
Confirmed, not
|
Also ruled out is Then See https://travis-ci.org/peiyuwang/pants/builds/ for
One possibility is its memory space has been GC-ed and reallocated, the reason it hits travis could be due to 1) multiprocessing has higher memory overhead, 2) we use #cpu to allocate multiprocessing pool size, 3) travis has 32 cpu compared to local mac, so on travis cStringIO GC is more likely to happen. |
@kwlzn suggests to use However we have two places using
So I ended up using BytesIO for write, but StringIO for read. There is some performance loss but not as big as both using |
From .travis.osx.yml:
How about a similar approach for the linux build? I personally use pyenv on my linux machine to manage user-specific python interpreters. This is free from a sudo requirement. This would allow install of python 2.7.11 for example, though at a CI shard speed hit. It may be worth an experiment to determine what the CI shard setup speed hit is in practice. |
Reviewed at https://rbcommons.com/s/twitter/r/3761/ |
placeholder issue for CI hang investigation on the
test_engine.py
suite.The text was updated successfully, but these errors were encountered: