Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Test strategy #688

Closed
wants to merge 23 commits into from
Closed

Add Test strategy #688

wants to merge 23 commits into from

Conversation

twsl
Copy link
Contributor

@twsl twsl commented Dec 19, 2021

What does this PR do?

Fixes #675

This is just a proposal as I don't have multiple gpus for testing.
This increases the number of tests a lot!

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Dec 19, 2021

Codecov Report

Merging #688 (df21357) into master (c89a740) will decrease coverage by 22%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #688     +/-   ##
=======================================
- Coverage      95%    73%    -22%     
=======================================
  Files         166    166             
  Lines        6413   6413             
=======================================
- Hits         6105   4684   -1421     
- Misses        308   1729   +1421     

@Borda Borda added enhancement New feature or request test / CI testing or CI labels Dec 21, 2021
@Borda
Copy link
Member

Borda commented Dec 21, 2021

well seems that some tests at the beginning is hanging, have you tried to run it locally?

@@ -55,22 +55,29 @@ def sk_auc(x, y, reorder=False):

@pytest.mark.parametrize("x, y", _examples)
class TestAUC(MetricTester):
@pytest.mark.parametrize("ddp", [False])
@pytest.mark.parametrize(MetricTesterDDPCases.name_strategy(), MetricTesterDDPCases.cases_strategy())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DDP apparently fails and therefore was probably excluded in the tests, yet there is no mention in the docs, that DDP is not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Borda @SkafteNicki any idea how to handle this? removing the full ddp/device strategy and opening an issue to fix auc would be my recommendation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard for me to remember, but basically AUC needs ordered input to be working and while we are testing on ordered input when we move to ddp it somehow gets unordered. I wonder if we should just set the reorder flag to True whenever ddp=True.

@twsl twsl marked this pull request as ready for review December 27, 2021 21:28
@mergify mergify bot added the has conflicts label Jan 4, 2022
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, thx for this PR, just very sure if it is really needed... what you try to set here is running CPU and GPU in a single test call, but even now we are trying to optimize the test to run as efficiently as possible, so on CPU machine run just CPU version and on GPU run CUDA version (and skip CPU version as it was run already elsewhere)
Said so I think that is also by user expectation, test on the best available resources, so on GPU machine run CUDA tests and on CPU imagine the rest... so if you have a case that you have GPU and you want to run CPU test you can simply disable GPU visibility for the particular run, for example: CUDA_VISIBLE_DEVICES=-1 pytest ...

@@ -75,25 +75,28 @@ def average_metric(preds, target, metric_func):
class TestPESQ(MetricTester):
atol = 1e-2

@pytest.mark.parametrize("ddp", [True, False])
@pytest.mark.parametrize(MetricTesterDDPCases.name_strategy(), MetricTesterDDPCases.cases_strategy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it is cleaner if it will be a constant or function returning tuple:

def ddp_name_options():
    return "ddp", (True, False)

and later:

@pytest.mark.parametrize(*ddp_name_options())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but anyway why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added this stategy was that now all available test cases are generated and you can explicitly test them locally. This way you can guarantee full coverage, don't depend on the ci machines or some launch flags and each test states on which device it runs. This way you dont have to check the machine stats to e.g. find out, if a test failed on single or multi gpu.
In order to optimize the tests, having an additional skip condition with an env var could achieve the same. Especially for test optimization you could then run ddp=false on a machine with a single gpu and ddp=True on one with multiple.
I'm fine putting it into a single function, my idea was to stick with two parameter style of parametrize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Borda that it would be better to return a tuple there.

For the general purpose: I think we should split this between arguments for every test (like devices and maybe also ddp strategies) for which we should use pytest fixtures (that's what they are there for) and then you could call pytest --devices=XX tests. For test-specific arguments this approach is fine IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure how to continue with this. Any suggestions?

@Borda Borda assigned justusschock and unassigned SkafteNicki Jan 6, 2022
@Borda Borda self-requested a review January 6, 2022 15:06
@Borda Borda force-pushed the master branch 3 times, most recently from 3a0f7dc to 317182c Compare January 10, 2022 13:05
@Borda Borda added this to the v0.8 milestone Jan 12, 2022
@maximsch2
Copy link
Contributor

@SkafteNicki , I think it makes sense to merge this first before merging #867 to make sure things continue working correctly on GPUs.

@Borda Borda modified the milestones: v0.8, v0.9 Mar 22, 2022
@Borda Borda removed this from the v0.9 milestone Apr 20, 2022
@Borda
Copy link
Member

Borda commented May 5, 2022

Thank you for your suggestion, but atm we are in the process of additional test optimization 😇

@Borda Borda closed this May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request has conflicts test / CI testing or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test all metrics on GPU
5 participants