tighten test_xlm bound #4261

austereantelope · 2021-12-19T16:34:15Z

Patch description
Hi,

The test test_xlm in test_transformers.py has an assertion bound (self.assertLessEqual(test['ppl'], 1.3)) that is too loose. This means potential bug in the code could still pass the original test.

To quantify this I conducted some experiments where I generated multiple mutations of the source code under test and ran each mutant and the original code 100 times to build a distribution of their outputs. I used KS-test to find mutants that produced a different distribution from the original and use those mutants as a proxy for bugs that could be introduced. In the graph below I show the distribution of both the original code and also the mutants with a different distribution.

Here we see that the bound of 1.3 is too loose since the original distribution (in orange) is much less than 1.3. Furthermore in this graph we can observe that there are many mutants (proxy for bugs) that are below the bound as well and that is undesirable since the test should aim to catch potential bugs in code. I quantify the "bug detection" of this assertion by varying the bound in a trade-off graph below.

In this graph, I plot the mutant catch rate (ratio of mutant outputs that fail the test) and the original pass rate (ratio of original output that pass the test). The original bound of 1.3 (red dotted line) has a catch rate of 0.

To improve this test, I propose to tighten the bound to 1.02 (the blue dotted line). The new bound has a catch rate of 0.17 (+0.17 increase compare to original) while still has >99 % pass rate (test is not flaky, I ran the updated test 500 times and observed >99 % pass rate). I think this is a good balance between improving the bug-detection ability of the test while keep the flakiness of the test low.

Furthermore, I observed that the value of the assertions in this test never goes below 1. Would it be helpful to include an additional assertion for example assertGreaterEqual(test['ppl'], 1) as well? Some mutants can change the value to below 1 and current assertion setup would miss those mutants/bugs. By adding the additional assertion of greaterthan 1 we can improve the catch rate further to 0.23

Do you guys think this makes sense? Please let me know if this looks good or if you have any other suggestions or questions.

My Environment:

python=3.7.11
pytorch=1.10.0

my parlai Experiment SHA:
4b1d07d0eeb14f849ad930eeb001327f9bfc2db1

facebook-github-bot · 2021-12-19T16:34:19Z

Hi @austereantelope!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

stephenroller · 2021-12-19T16:47:20Z

Super cool! Awesome analysis. Very happy to try this for a bit and see how it goes.

Please just sign the CLA and we'll land.

facebook-github-bot · 2021-12-19T17:26:00Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

austereantelope · 2021-12-21T16:51:55Z

@stephenroller I've signed the CLA, is it good to merge now?

stephenroller

Yup. Sorry for slow response, busy on my end.

tighten test_xlm bound

bd3632a

facebook-github-bot added the CLA Signed label Dec 19, 2021

stephenroller approved these changes Jan 6, 2022

View reviewed changes

stephenroller merged commit 0470430 into facebookresearch:main Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tighten test_xlm bound #4261

tighten test_xlm bound #4261

austereantelope commented Dec 19, 2021

facebook-github-bot commented Dec 19, 2021

stephenroller commented Dec 19, 2021

facebook-github-bot commented Dec 19, 2021

austereantelope commented Dec 21, 2021

stephenroller left a comment

tighten test_xlm bound #4261

tighten test_xlm bound #4261

Conversation

austereantelope commented Dec 19, 2021

facebook-github-bot commented Dec 19, 2021

Action Required

Process

stephenroller commented Dec 19, 2021

facebook-github-bot commented Dec 19, 2021

austereantelope commented Dec 21, 2021

stephenroller left a comment

Choose a reason for hiding this comment