Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing pytest #206

Merged

Conversation

byte-sculptor
Copy link
Contributor

@byte-sculptor byte-sculptor commented Dec 10, 2020

This PR:

  1. Invokes pytest with a -n auto argument to make multiple tests run in parallel (num parallel jobs = num cores)
  2. Installs pytest-xdist in all our pipelines to make the above possible
  3. Shortens the duration of TestOptimizerEvaluator
  4. Modifies the tests that use gRPC to attempt to start the service on 100 different ports before giving up. This is the easiest way to make sure that all tests requiring gRPC can run in parallel. The alternative would be to have them all talk to a single instance, but it would turn an easy parallelization problem into a hard one as we'd need to manage the lifetime of that single instance.
  5. Relaxes the check in TestSmartCacheWithRemoteOptimizerV3.py as the current one was a bit too ambitious and lead to some flakiness.
  6. Puts a band-aid on test_optimization_with_context. Turn Gaussian Blob With Context into an ObjectiveFunction #207 hints at a long-term fix.

self.bayesian_optimizer_factory = BayesianOptimizerFactory(grpc_channel=self.optimizer_service_grpc_channel, logger=self.logger)
max_num_tries = 100
num_tries = 0
for port in range(50051, 50051 + max_num_tries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, questions ...
Is this only necessary when running tests in parallel?
Or is it not cleaning up after itself? Or maybe some delay after the teardown is necessary?
Can a single remote optimizer not handle requests from multiple clients?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fd8ef08 I think we had to add some additional code to the teardown_method:
e29697b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so that multiple services can run in parallel. Having one handle multiple requests would work too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*one service - each test instantiates its own optimizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense.
The teardown changes (specifically closing the client channel explicitly) may still be necessary in the other one. I think I only made the change in the one that was throwing errors at the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly it leaves a socket open that causes conflicts later on when it goes to reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would only happen on the server side. Is there any repro that would tell us when we've fixed the bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@bpkroth bpkroth Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the command to execute them got dropped in the commit message (-k test_echo is optional - it's there just to try and keep the repro as simple as possible and works mostly because I added a second test_echo2 - the bug only shows up after the second test run since there needs to be cruft left over from the first):

# pytest -svxl -k test_echo source/Mlos.Python/mlos/unit_tests/TestBayesianOptimizerGrpcClient.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks

@byte-sculptor byte-sculptor changed the title DRAFT: Parallelizing pytest Parallelizing pytest Dec 11, 2020
Copy link
Contributor

@bpkroth bpkroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor suggestions.

byte-sculptor and others added 2 commits December 10, 2020 21:02
…mizer.py

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>
…mizer.py

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>
Copy link
Contributor

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't look into the issue @bpkroth pointed out in detail but otherwise looks good.

@bpkroth
Copy link
Contributor

bpkroth commented Dec 11, 2020

didn't look into the issue @bpkroth pointed out in detail but otherwise looks good.

@amueller it was just another place to apply the grpc client close issue you found and I helped debug when converting to pytest (exactly references are in the thread).

@byte-sculptor byte-sculptor merged commit 0ef1a92 into microsoft:main Dec 11, 2020
bpkroth added a commit to bpkroth/MLOS that referenced this pull request Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants