Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tighten test_process_pixels_gray bound #5690

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

austereantelope
Copy link

@austereantelope austereantelope commented Jan 24, 2022

Proposed change(s)

The test test_process_pixels_gray in test_rpc_utils.py has an assertion bound (assert (np.mean((in_array.mean(axis=2, keepdims=True) - out_array)) < 0.01)) that is too loose. This means potential bug in the code could still pass the original test.

To quantify this I conducted some experiments where I generated multiple mutations of the source code under test and ran each mutant and the original code 100 times to build a distribution of their outputs. Each mutant is generated using simple mutation operators (e.g. > can become < ) on source code covered by the test. I used KS-test to find mutants that produced a different distribution from the original and use those mutants as a proxy for bugs that could be introduced. In the graph below I show the distribution of both the original code and also the mutants with a different distribution.

Here we see that the bound of 0.01 is too loose since the original distribution (in orange) is less than 0.01. Furthermore in this graph we can observe that there are many mutants (proxy for bugs) that are below the bound as well and that is undesirable since the test should aim to catch potential bugs in code. I quantify the "bug detection" of this assertion by varying the bound in a trade-off graph below.

In this graph, I plot the mutant catch rate (ratio of mutant outputs that fail the test) and the original pass rate (ratio of original output that pass the test). The original bound of 0.01 (red dotted line) has a catch rate of 0.33.

To improve this test, I propose to tighten the bound to 0.002 (the blue dotted line). The new bound has a catch rate of 0.67 (+0.34 increase compare to original) while still has >99 % pass rate (test is not flaky, I ran the updated test 500 times and observed 100 % pass rate). I think this is a good balance between improving the bug-detection ability of the test while keep the flakiness of the test low.

Do you guys think this makes sense? Please let me know if this looks good or if you have any other suggestions or questions.

My Environment:

python=3.6.13
pytorch=1.8.1

my ml-agents Experiment SHA:
d2ee1e2569aa68d3ac0e1dd8d91f1bf812b9df27

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

@CLAassistant
Copy link

CLAassistant commented Jan 24, 2022

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants