Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert tests from dict of dicts to list of tuples to avoid key overwrites #167

Merged
merged 10 commits into from
Aug 9, 2024

Conversation

jagerber48
Copy link
Contributor

@jagerber48 jagerber48 commented Nov 20, 2022

This PR

  • Moves formatting code into a test_formatting.py module
  • Refactors test_format to use pytest.mark.parametrize. This solves Bug: repeated dict keys in formatting testing results in some keys going untested. #162 since cases are no longer stored as dict keys (which were accidentally repeated). Now each case is a 4-tuple. Previously test cases with the same value/uncertainty had the option to be grouped together within a dict under that value/uncertainty tuple. That option is lost since I don't really know how to do that in pytest. I've done this using unittest.subTest in the past, I think there might be a pytest extension that does something like this.
  • Removes checks and switch case for jython. Let me know if these need to be added back in.
  • Correction to some tests that were previously not running, but which are now failing <>

There is a lot of cruft in comments and possibly tests about subtleties between old versions of python. Hopefully this can be removed soon.


OLD DESCRIPTION:

See #162.
tests object containing test cases for ufloat string formatting is currently a dict where keys are tuples of value/uncertainty pairs and values are dicts of format string/expected output pairs. Similar tests are chunked together under the same value/uncertainty pair. However, the same value/uncertainty pair appears multiple times in the top level dict. In this case only one grouping of tests is captured in the dict, and all others are overwritten.

To avoid this overwriting we can convert the dict of dicts into a list of tuples. In this case elements of the list are tuples whose first element is a value/uncertainty pair tuple and whose second element is a dict of format string/expected output pairs.

@jagerber48
Copy link
Contributor Author

Note on master commit 804adc running nosetests test_uncertainties.py reports OK but when using this change 1 failure is detected:

AssertionError: Incorrect representation '   3.14150+/-   0.00010' for format '>10f' of 3.1415+/-0.0001: '  3.141500+/-  0.000100' expected.

This corresponds to the >10f format in these tests at line 1568:

    tests = [  # (Nominal value, uncertainty): {format: result,...}

        # Usual float formatting, and individual widths, etc.:
        ((3.1415, 0.0001), {
            '*^+7.2f': '*+3.14*+/-*0.00**',
            '+07.2f': '+003.14+/-0000.00',  # 0 fill
            '>10f': '  3.141500+/-  0.000100',  # Width and align
            '11.3e': '  3.142e+00+/-  0.000e+00',  # Duplicated exponent
            '0.4e': '3.1415e+00+/-0.0000e+00'  # Forced double exponent
        }),
        ...

My expected output for that line is actually 3.1415+/- 0.0001 which is different than both the expected and actual output above. I guess the issue is the precision rather than width. Need to process what's going on here...

@jagerber48
Copy link
Contributor Author

jagerber48 commented Nov 20, 2022

I see, the f format defaults to a precision of 6 digits after the decimal place. For some reason ufloat formatting gives 5 digits after the decimal.
that is:

from uncertainties import ufloat
u = ufloat(3.1415, 0.0001)
print(f'{u:f}')
print(f'{3.1415:f}')
print(f'{0.0001:f}')

gives me

3.14150+/-0.00010
3.141500
0.000100

so something is funny with f format and default precision.
This is on python 3.9.13.

Not sure how this should be addressed within this PR since the PR is about changing unit test formatting. I could comment out the failing unit test so that the PR commit builds. Otherwise I could change the unit test result so that it passes, but that feels like cheating. The other alternative is wait for a fix to what's causing the unit test to fail for this PR to go through.

@jagerber48
Copy link
Contributor Author

jagerber48 commented Nov 20, 2022

Oh, sorry for the spam. I see what's going on. If no precision is supplied in the format string then PDG rules for sig figs on uncertainty sets precision, rather than python defaults. So I think the output from the program is correct, and the expected unit test result is wrong.

But even if I fix this one I think there are more unit test failures after this one that will appear one by one. @lebigot what should this PR, which unburies more unit tests, do wrt the unburied unit tests failing? Just go through or wait for resolutions?

@andrewgsavage
Copy link
Contributor

Could you get this running again? I ended up setting this to noqa to get the linter to pass.

It would be good to set it up as a pytest style test with the different tests parameterised.

@wshanks
Copy link
Collaborator

wshanks commented Jul 24, 2024

When reviewing #248, I did the same thing as @jagerber48, deduplicating the keys locally and finding at least one test failure. I had wanted to get back to that as part of removing the noqas. At the time, my thought was to defer to @jagerber48's judgment because the failing test was related to string formatting.

@jagerber48
Copy link
Contributor Author

Yes, I can revisit this. And yes, in doing so I'll upgrade to the pytest parametrized tests.

here I enumerate the failures I found. Some of the failures look like mistakes in the test expected results, but some of the failures look like they reveal either bugs, or at least ambiguities in the format specification.

Before the pytest.parametrize refactor a bug involving repeated keys in the dictionary made it so many tests were not actually running. Moving to pytest.parametrize re-exposed these tests but unfortunately some of them are failing.
Copy link

codecov bot commented Jul 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.46%. Comparing base (f097015) to head (24d5920).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
+ Coverage   96.33%   96.46%   +0.13%     
==========================================
  Files          15       16       +1     
  Lines        1909     1898      -11     
==========================================
- Hits         1839     1831       -8     
+ Misses         70       67       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jagerber48
Copy link
Contributor Author

Ok, I refactored the tests to use pytest.parametrize. In the old format a value/uncertainty pair tuple was used as the key to a dictionary that contained fmt_spec, expected_str key: value pairs. I wanted each case to run in pytest with its own infrastructure so I moved to a value, uncertainty, fmt_spec, expected_str tuple for EACH test. I don't know how to accomplish the nested tests in pytest. It can be done in unittest with subTest. Looks like it could be done with the pytest-subtests package.

I manually copied the test from the old structure to the new. I tried hard not to miss anything but I'm only human. I couldn't think of an automated way to do it easily.

After doing this I found 8 tests that were failing. These can be seen in this diff. In a following post I'll try to enumerate why I think the tests are failing. Some of them look like typos in the expected output, some of them look like typos in the intended format specification string or a forgetting how edge cases in the formatting work. A few of them are testing cases where I don't actually know what to do expect.

For the cases where I truly don't know how to resolve the test case I'm going to leave it as an expected failure and open up a new issue to try to figure out the issue.

Going through these tests remind that there are some pretty confusing rules when it comes to uncertainties formatting. These rules are likely not documented in the uncertainties documentation and they may or may not be documented fully in the source code comments. It re-inspires me to think about implementing sciform as a formatting backend for uncertainties. I think a lot of the complexity of uncertainties formatting comes from the fact that it is trying to somehow "stay true" to the python built-in format specification language for python floats, whereas sciform is inspired by both the python built-in FSML, and the uncertainties FSML, but where ambiguities or confusion arises sciform chooses simplicity for the scientist over adherence to the possibly convoluted prior conventions.

@jagerber48
Copy link
Contributor Author

jagerber48 commented Jul 27, 2024

Failing tests:

# val, std_dev, fmt_spec, 

# 1
3.1415, 0.0001, ">10f"
Expected :'  3.141500+/-  0.000100'
Actual   :'   3.14150+/-   0.00010'
Discussion: It looks like this test was written assuming `3.1415` would get formatted as `3.141500`, with 6 digits-past-the-decimal which is the default for float formatting. But uncertainties, when no precision is specified, defaults to PDG sig fig rounding on the uncertainty. See https://github.com/lmfit/uncertainties/blob/master/uncertainties/formatting.py#L850. PDG rules indicate this should get rounded to two digits of uncertainty, the fifth digit past the decimal point.
Resolution: Change expected string to `"   3.14150+/-   0.00010"`.

# 2
3.1415, 0.0001, "0.4e-3"
Expected :'3.1415e+00+/-0.0000e+00'
Actual   :'(3.1415+/-0.0001)e+00'
Discussion: This test comes with a comment for "forced double exponent". Here the double exponent is intended to be forced by the explicit width specification of `0`. But actually the width must be explicitly set to `1` to force the double exponent. See https://github.com/lmfit/uncertainties/blob/master/uncertainties/formatting.py#L687. There is also a simple rounding typo.
Resolution: Change the fmt_spec to `1.4e-3` and change the expected string to '3.1415e+00+/-0.0001e+00'

# 3
12.3, 456.78, ""
Expected :'12+/-457'
Actual   :'(0+/-5)e+02'
Discussion: I'm not sure what logic went into writing the expected string here. The rules that apply here are (1) that with no precision specified the precision is determined by PDG sig fig rules on the uncertainty, in this case one sig fig and (2) with no precision type specified then the `g` precision type is used. `457:.1g` gets formatted to `5e+02` because the `g` format specification coerces to `e` before displaying trailing insignificant zeros before the decimal point.
Resolution: Change the expected string to `"(0+/-5)e+02"`.

# 4
12.3, 456.78, ".1ueS"
Expected :'0.123(4.568)e+02'
Actual   :'0(5)e+02'
Discussion: This one got mangled in my reformatting. Originally it was `.4ueS` which should be consistent with the expected output. I realized this because I noticed in my previous post there wasn't a test like this one flagged.
Resolution: Change the fmt_spec back to `.4ueS`. 

# 5
1234.56789, 0.1, ".0f"
Expected :'(1234+/-0.)'
Actual   :'1235+/-0'
Discussion: This test is a bit confusing. First off there is a simple rounding error. Second, the expected string has a trailing decimal symbol but the actual string does not have a trailing decimal symbol. The `uncertainties` convention is that uncertainty that got rounded to zero should have a trailing decimal, but uncertainty that is equal to zero should have no trailing decimal symbol. This test conflicts with that convention. The comment on this test is `Approximate error indicated with "."`. I think the conflict arises from an intersection with edge cases on how python formatting handles trailing decimal symbols. Finally, for some reason the expected string includes enclosing parentheses, but those should only appear on `f` formatting if the `p` option is selected.
Resolution: I think the rounding error and the enclosing parentheses are a typo so I will remove those from the expected string. However, I think this is a real failure of the trailing decimal handling. I will continue to mark it as an expected failure and open an issue.

# 6
1234.56789, 0.1, "e"
Expected :'(1.23456+/-0.00010)e+03'
Actual   :'(1.23457+/-0.00010)e+03'
Discussion: This is a simple rounding typo
Resolution: Set expected string to '(1.23457+/-0.00010)e+03'

# 7
1234.56789, 0.1, "E"
Expected :'(1.23456+/-0.00010)E+03'
Actual   :'(1.23457+/-0.00010)E+03'
Discussion: This is a simple rounding typo
Resolution: Set expected string to '(1.23457+/-0.00010)E+03'


# 8
1234.56789, 0.1, "%"
Expected :'123457+/-10%'
Actual   :'(123457+/-10)%'
Discussion: Percent mode always has parentheses enclosing the value/uncertainty part
Resolution: Set expected string to '(123457+/-10)%'

I'll resolve all of the failing tests in the next commit. Failing test 5 will continue to fail after these corrections. I think it is a real failure. I'm going to mark it as an expected failure and open an issue.

@jagerber48
Copy link
Contributor Author

Opened #259

@jagerber48
Copy link
Contributor Author

TLDR: This PR is ready for review.

  • Formatting tests moved into test_formatting.py
  • Test reformatted to use pytest.
  • A few tests were not being run. When the un-run test were revealed and run there were 7 failures. Most of them benign having to do with a typo. But one reveals a real bug or at least inconsistency in the formatting code. See Trailing decimal symbol test failing #259
  • There are other tests in test_formatting.py. These were just copy and pasted. Eventually these can be cleaned up an put into pytest format also.

@andrewgsavage
Copy link
Contributor

that looks good, thanks for getting to it

@jagerber48 jagerber48 merged commit ce16269 into lmfit:master Aug 9, 2024
20 checks passed
@jagerber48 jagerber48 deleted the bugfix/avoid_test_case_overwrites branch August 9, 2024 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants