Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different tests collected on two nodes with xdist #437

Closed
pytestbot opened this issue Jan 31, 2014 · 17 comments
Closed

Different tests collected on two nodes with xdist #437

pytestbot opened this issue Jan 31, 2014 · 17 comments
Labels
plugin: xdist related to the xdist external plugin type: bug problem that needs to be addressed

Comments

@pytestbot
Copy link
Contributor

Originally reported by: Florian Rathgeber (BitBucket: frathgeber, GitHub: frathgeber)


When running py.test with xdist and -n 2 on Travis we occasionally get a

AssertionError: Different tests were collected between gw1 and gw0

Here's an example failure.

We're not overriding the test collection hooks and I can't immediately think of any other way we might affect test collection.

Tested with pytest 2.5.2, xdist 1.10, execnet 1.2.0.


@pytestbot
Copy link
Contributor Author

Original comment by Jurko Gospodnetić (BitBucket: jurko, GitHub: jurko):


I have the same situation when running the jurko/suds project tests using Python 3.

Steps to reproduce:

  1. Clone jurko/suds (tested with HEAD at this commit)
  2. Build/install the project by running py3 setup.py develop from its top level folder. That should place its Python 3 compatible sources under build/lib.
  3. Run the project tests from its top level project folder using py3 -m pytest build/lib/tests -n 4.

This should cause pytest to report two of its gw# test nodes collecting different tests.

The same does not occur when using Python 2.

My version information:

Windows 7 x64 SP1
Python 3.3.3
pytest 2.5.2
xdist 1.10

Hope this helps.

Best regards,
Jurko Gospodnetić

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


Hey @jurko,

I have taken the steps to reproduce this in order to work on a patch since we're having the same issue at work, but found out that you can make a simple change in order to fix this.

I noticed the culprit for the different collections was test_create_builtin_type_schema_objects, which uses parametrize to generate several test cases. Problem is it uses dict.items() to generate the parameters, which in Python >= 2.7 leads to a different ordering each time it runs. Since each worker does the collection, each one ends up generating different test ids.

I created a pull request with that fix.

Cheers,

@pytestbot
Copy link
Contributor Author

Original comment by Jurko Gospodnetić (BitBucket: jurko, GitHub: jurko):


Wow, thanks for taking the time and providing a working jurko/suds fix! 😄

And as for the pytest issue - I am not well versed with how pytests's xdist plugin separates its test runs over different test processes, but my first instinct is that only one process should do the test collection and then distribute the resulting test list to each worker process. That would both, not replicate the collection work between different processes and effectively avoid this whole issue.

Or, at the very least, pytest documentation should contain a BIG CAPITAL LETTER WARNING about having to construct test parametrization collections so their ordering does not vary between different test runs.

Best regards,
Jurko Gospodnetić

@pytestbot
Copy link
Contributor Author

Original comment by Jurko Gospodnetić (BitBucket: jurko, GitHub: jurko):


Another note - this behaviour is most obvious with Python interpreter versions 3.3+ where Python's hash randomization feature is enabled by default, thus effectively forcing different processes to always use different element orderings in their dictionaries & sets.

For Python interpreter versions 2.6.8+ this same feature can be enabled manually by specifying the -R command-line option or defining the PYTHONHASHSEED environment variable and setting its value to random.

Hope this helps.

Best regards,
Jurko Gospodnetić

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


You are right about the documentation mentioning this issue. I also wasn't aware about hash optimization being optional. 😄

There's a related thread here if you want to follow it. I'll have to read it more carefully before deciding either to work on a patch to the problem or a documentation fix.

Best Regards,

@pytestbot
Copy link
Contributor Author

Original comment by holger krekel (BitBucket: hpk42, GitHub: hpk42):


Even if one test node does the collection and creates the list of tests, the other nodes will also need to do a collection. All nodes need to construct a tree of items so that item.runtest() can be called. We can still go for one node doing the "authoritative" collection for distributing tests i guess if the only difference is ordering. Requires refined logic in LoadScheduling.send_runtest_some i think. We would maintain a node-specific list of test ids and send indices accordingly.

@pytestbot
Copy link
Contributor Author

Original comment by Jurko Gospodnetić (BitBucket: jurko, GitHub: jurko):


Sorry if I'm too ignorant on the topic and end up talking gibberish. 😄

But, I'm not sure I follow - if one node performs a collection, why would the others need to do it as well?

Why could it not simply split up the tests over different nodes and send each its own test sub-collection? Then those nodes would not need to walk over all project files and look for all the tests but only look up exactly those tests they were given in their sub-collection.

Or... even better - something we do in an internal distributed build system we're building at my company - the central location prepares a list of tasks (read: tests), and queues them. Then each node is simply given the next scheduled task / test (or the next N tasks/tests - what ever ends up giving you the best final performance) to perform as it's done with performing the previous batch. That way, if you have nodes distributed over different computers - faster nodes process more tasks/tests than slower one at the same time. You can then even naturally build further on and have a dedicated persistent tester node farm and have different test runs share the same tester node farm, with some test farm manager deciding how to split the tasks/tests over the available test nodes. Seems like a real fun project all in all. 😄

Best regards,
Jurko Gospodnetić

@pytestbot
Copy link
Contributor Author

Original comment by Florian Rathgeber (BitBucket: frathgeber, GitHub: frathgeber):


@nicoddemus Can you see a similar issue with test parametrization in the Firedrake tests? We're only using lists and tuples to parametrize tests and fixtures, so we must be facing a different issue. Any thoughts?

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


@fr710, I did search through parametrize usage in firedrake and they seem to be fine. Looking at the output of the failure you posted it is clear that one of the nodes is collecting a different number of tests than the other (gw0 [0] / gw1 [716]), which was not the case of jurko/suds...

I will study a little more of the xdist code... If I find anything I will post it here.

Cheers,

@pytestbot
Copy link
Contributor Author

Original comment by holger krekel (BitBucket: hpk42, GitHub: hpk42):


@jurko the current model for xdist is that each slave builds up the collection tree that you see with --collect-only. A more distributed way of performing collections is thinkable but does not directly fit with the current model and is more complex to get right. I am awaere that the current model has limitations and especially requires repeatability of collections across nodes (for the load-scheduling mode). A more advanced distributed model is quite some work, though.

Most uses of xdist probably are with the -n option. On Unix we could just start one node, have it perform collection and then fork the process.

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


Hello everyone,

I have played with xdist's code a bit and got to the point where I can build up the collection only at the master node, and let the schedule try to send collected items to the slaves instead of them doing the collection themselves. But as I suspected, I can't send pytest items through the gateways as they can't be serialized at the moment.

@hpk42, do you think it is feasible to try to serialize those objects? I see they have several attributes that seem to be hard to replicate on the other side of the wire, but I might be wrong. Your suggestion of using fork won't work for us unfortunately because we run mostly Windows.

Another option would be to distribute the collection process as you suggested, for instance dividing test file names between the slaves which would in turn be collected. I understand this would not be optimal (a file could have 1000 tests and another only 1, meaning a slave would have to collect and run 1000 tests and another slave a single test) but as project size grows so does the number of test modules, so in the end we might not lose much overall. That's the approach we have implemented in-house for xUnit some years ago.

Any advice is greatly appreciated.

Cheers,

@pytestbot
Copy link
Contributor Author

Original comment by holger krekel (BitBucket: hpk42, GitHub: hpk42):


Collecting tests in the master is a very difficult path. I know because xdist used to follow this model :) You still need to do the collection on the slaves in order for test ids match.

We still can improve the logic for "do the collections from the sub nodes match" but making it order-independent, and by allowing nodes to have extra tests. We just need to make sure then to send those test ids to the particular node. We should emit a warning if ordering or extra tests are encountered (config.warn can be used on trunk for that).

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


Thanks for the response! :)

I'm guessing (and it really just a guess) that right now the slaves don't always collect the same number of tests, besides ordering, because there's some race condition happening during import that messes up collection for some nodes. That's why it is not reproducible 100% of the time.

In our code base for instance, the difference in collection is always related to all the tests of some file, as if there was a problem importing it somehow:

AssertionError: Different tests were collected between gw3 and gw0. The difference is:
--- gw3

+++ gw0

@@ -579,6 +579,10 @@

 ../source/python/souring20/core/model/_tests/test_tools.py::Test::testResetSubjectList
 ../source/python/souring20/core/model/_tests/test_tools.py::Test::testThresholdRange
 ../source/python/souring20/core/model/_tests/test_tools.py::Test::testThresholdRangeModified
+../source/python/souring20/core/model/_tests/test_water_type.py::Test::testColor
+../source/python/souring20/core/model/_tests/test_water_type.py::Test::testReset
+../source/python/souring20/core/model/_tests/test_water_type.py::Test::testStatus
+../source/python/souring20/core/model/_tests/test_water_type.py::Test::testWaterTypes
 ../source/python/souring20/core/model/_tests/test_well_properties.py::Test::testEclipseWellCurves
 ../source/python/souring20/core/model/_tests/test_well_properties.py::Test::testIMEXWellCurves
 ../source/python/souring20/core/model/_tests/test_well_properties.py::Test::testMissingWellCurves

As you can see, all tests from file test_water_type.py were not collected by gw3. If this is the case, making each slave collect the tests one at a time (as opposed to do the collection in parallel) would solve the problem. I will try to investigate this further.

Back to your suggestion: I think it seems doable, I will give it a shot. :)

Cheers,

@pytestbot
Copy link
Contributor Author

Original comment by holger krekel (BitBucket: hpk42, GitHub: hpk42):


race conditions between slaves can be an issue i guess. In theory it should not be a problem but across python2.6-python3.X there have been interpreter bugs with respect to import races. We could introduce an ini-option "xdist_serialize_imports" or so that would trigger xdist to perform a full collection on one host and then trigger all the other hosts at once (presuming that all imports and pyc-file or pyx-file writing has been done at this point and imports can happen concurrently). This would also go half the way of being able to fork the first node with a full collection to avoid re-doing the collection multiple times.

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


Hey @fr710,

I have worked on a proof of concept that works for us. Can you try my fork at https://bitbucket.org/nicoddemus/pytest-xdist and let me know if that works for you as well? If that's positive I will work on a PR, I would like some more real world testing before working on a proper patch.

Cheers,

@pytestbot
Copy link
Contributor Author

Original comment by Bruno Oliveira (BitBucket: nicoddemus, GitHub: nicoddemus):


To those interested, I found that it seems to be a problem with assert rewriting when multiple slaves are involved (there's a patch proposal for this here). Meanwhile, a workaround is to pass --assert=plain to py.test.

Cheers,

@pytestbot
Copy link
Contributor Author

Original comment by holger krekel (BitBucket: hpk42, GitHub: hpk42):


updated CHANGELOG and trace error message as requested in review

fixes issue #437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: xdist related to the xdist external plugin type: bug problem that needs to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant