make crs bigger for new pars file for jump unit tests #1356

braingram · 2024-08-08T19:08:29Z

Make CRs bigger than the threshold used for the new pars files.

Fixes the unit test failures seen in:
#1355
due to the new pars file:
https://roman-crds.stsci.edu/browse/roman_wfi_pars-jumpstep_0001.asdf

Regression tests: https://github.com/spacetelescope/RegressionTests/actions/runs/10308257135

show only expected differences due to the new crds context:

E         {'values_changed': {"root['roman']['meta']['ref_file']['crds']['context_used']": {'new_value': 'roman_0063.pmap',
E                                                                                           'old_value': 'roman_0061.pmap'}}}

Checklist

added entry in CHANGES.rst under the corresponding subsection
updated relevant tests
updated relevant documentation
updated relevant milestone(s)
added relevant label(s)
ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR

codecov · 2024-08-08T19:40:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.57%. Comparing base (4086056) to head (0c13448).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1356   +/-   ##
=======================================
  Coverage   78.57%   78.57%           
=======================================
  Files         117      117           
  Lines        7861     7861           
=======================================
  Hits         6177     6177           
  Misses       1684     1684

Flag	Coverage Δ		*Carryforward flag
nightly	`62.26% <ø> (ø)`		Carriedforward from 4086056

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

schlafly · 2024-08-08T20:45:48Z

Do we think that we if we just add CRDS_CONTEXT to the list of parameters that this will go away? I think we don't want every time a new context comes along for us to have to update all of the regression test files?

braingram · 2024-08-08T20:52:21Z

Do we think that we if we just add CRDS_CONTEXT to the list of parameters that this will go away? I think we don't want every time a new context comes along for us to have to update all of the regression test files?

Do you mean define CRDS_CONTEXT to a specific context for the regression tests or add crds_context to the list of items ignored for compare_asdf?

For the latter, It does appear that jwst ignores the crds context for fitsdiff comparisons of generated to truth files:
https://github.com/spacetelescope/jwst/blob/4ecb40c57aaf1ae99fd6d8d9be66b7530277cf40/jwst/regtest/conftest.py#L236
That should allow the regtests to pass with this PR. Would you like me to add that to this PR and see (this PR seems like a good test)? EDIT: I added this change in edef4f6 so we can test it out

braingram · 2024-08-08T21:55:03Z

I have no clue at the moment why one regtest failed with the following 🤷

FAILED romancal/regtest/test_wfi_pipeline.py::test_level2_image_processing_pipeline - AssertionError: 
  Diff report for:
      result file: /runner/_work/_temp/pytest_basetemp/popen-gw3/test_level2_image_processing_p0/r0000101001001001001_01101_0001_WFI01_cal_repoint.asdf
          model type: ImageModel
      truth file: /runner/_work/_temp/pytest_basetemp/popen-gw3/test_level2_image_processing_p0/truth/r0000101001001001001_01101_0001_WFI01_cal_repoint.asdf
          model type: ImageModel
  
  {'values_changed': {"root['roman']['meta']['wcs_fit_results']['<rot>']": {'new_value': -5.9008763525942403e-05,
                                                                            'old_value': -7.897752747117105e-06},
                      "root['roman']['meta']['wcs_fit_results']['center'][0]": {'new_value': -249.45421319021025,
                                                                                'old_value': -250.89413395689374},
                      "root['roman']['meta']['wcs_fit_results']['center'][1]": {'new_value': 109.61122052869052,
                                                                                'old_value': 108.42239876672924},
                      "root['roman']['meta']['wcs_fit_results']['mae']": {'new_value': 0.002684014626406904,
                                                                          'old_value': 0.0028079485287068238},
                      "root['roman']['meta']['wcs_fit_results']['matrix'][0][1]": {'new_value': -1.0298972110582511e-06,
                                                                                   'old_value': -1.3784178894562014e-07},
                      "root['roman']['meta']['wcs_fit_results']['matrix'][1][0]": {'new_value': 1.0298972110582511e-06,
                                                                                   'old_value': 1.3784178894562014e-07},
                      "root['roman']['meta']['wcs_fit_results']['nmatches']": {'new_value': 109,
                                                                               'old_value': 110},
                      "root['roman']['meta']['wcs_fit_results']['proper_rot']": {'new_value': -5.9008763525942403e-05,
                                                                                 'old_value': -7.897752747117105e-06},
                      "root['roman']['meta']['wcs_fit_results']['rmse']": {'new_value': 0.003440309699826702,
                                                                           'old_value': 0.0037193506957389423},
                      "root['roman']['meta']['wcs_fit_results']['rot'][0]": {'new_value': -5.9008763525942403e-05,
                                                                             'old_value': -7.897752747117105e-06},
                      "root['roman']['meta']['wcs_fit_results']['rot'][1]": {'new_value': -5.9008763525942403e-05,
                                                                             'old_value': -7.897752747117105e-06},
                      "root['roman']['meta']['wcs_fit_results']['shift'][0]": {'new_value': -0.008499128453755975,
                                                                               'old_value': -0.00867141855839393},
                      "root['roman']['meta']['wcs_fit_results']['shift'][1]": {'new_value': 0.014101158752967921,
                                                                               'old_value': 0.014000953961758062}}}

schlafly · 2024-08-09T00:19:26Z

This is the one where I need the second okify. The change to tweakwcs defaults led a different star to be included in the match, making for directly different fit results. Sorry, I can okay that tomorrow.

schlafly · 2024-08-09T13:51:17Z

I okified the wcs_fit_results issue. I was suggesting adding CRDS_CONTEXT=roman_0061.pmap, as in this run: https://github.com/spacetelescope/RegressionTests/actions/runs/10319437845

I am just skeptical in general that for the regression tests we can use whatever context is the latest; it seems to me like that guarantees that regtests will fail if they use the new context, and if they don't use the new context then having the latest context isn't buying us anything?

braingram · 2024-08-09T15:48:17Z

Thanks! I reran the regtests and they all pass with this PR:
https://github.com/spacetelescope/RegressionTests/actions/runs/10308257135
It sounds like the open questions are:

should we ignore roman.meta.ref_file.crds.context_used in the compare_asdf diffs? This PR currently ignores it.
should the romcancal regtests pin the CRDS_CONTEXT? I think this is worth some discussion. If we keep the ignore for roman.meta.ref_file.crds.context_used we don't need to come to a conclusion on this. Even if we don't ignore it we could re-okify the regtests with the new context.

What do you think is a good forum for discussing pinning the CRDS_CONTEXT for the regtests? I'm happy to put stuff in this PR, a github issue, a jira ticket, etc.

I am just skeptical in general that for the regression tests we can use whatever context is the latest; it seems to me like that guarantees that regtests will fail if they use the new context, and if they don't use the new context then having the latest context isn't buying us anything?

Once things have been established (reference file formats are finalized, etc) I would expect that a new context should only produce expected differences in some data arrays (so if a new dark is added and put in use the regtests that compare the output from the dark currant step will fail until the new results are oked). I believe these are expected differences that benefit users since presumably the new results are more accurate. This does make the regression tests serve at least 2 purposes (and can lead to confusion):

does a code change impact the resulting data
does a crds context change impact the resulting data

And the second one sort of randomly pops up whenever a new context happens to be published. Is there any other place where a new context is tested (by running the romancal regtests) before it's made "live"? As @nden mentioned this morning, some of this will change when there is a crds ops server for roman but that sounds like it's a ways off.

schlafly · 2024-08-09T16:21:44Z

I agree with you that the "2 purposes" are the problem. It's good to know that we are sensitive to changes to reference files, but in practice, at least I'm pretty removed from those updates and I'm not sure I want to be sensitive to every time a new reference file goes in. Were we to move in that direction we'd want to coordinate better with CRDS about the deliveries.

I worry that at present it's going to be more like "tests are failing again, let's just update them, maybe CRDS changed," which would not be good.

I could imagine another set of tests (like the Mac tests, or something) where we use the latest context and observe failures but which doesn't drive development. But those would have to live in a separate artifactory directory if we ever wanted to okify them.

@ddavis-stsci , do you have a preference here? I think my sense is that as a temporary expedient let's go ahead and ignore context_used as you do here, but longer term I think I would lean towards our old model (fixing a context for the regression tests) and then changing from that later once we work out a new approach. We can take this up again at next week's romancal tag up.

ddavis-stsci · 2024-08-09T18:25:53Z

I would not ignore the CRDS context. We tie each release to a specific context and should test against that.
I assume that by "latest" you mean the operational context? We should always be testing again the reference files that have been deemed official.
If we don't check the crds_context I worry that it is too easy to run tests using one context and the code published but does
not work with the operational context or has unexpected results.

Do the regression tests have a variety of purposes - yes. Should you okify tests because you "think" you have a crds change, No.

If the regression tests fail we need to understand why. If the code or the reference files have changed and the new results are more accurate the running okify is a convenient way to update the files. If we do not understand why the results do not match the truth files the truth files should not be updated just the make them match.

schlafly · 2024-08-09T18:56:21Z

Okay, I take your message as saying that we should find a way to force the regression tests to use a specific context. Right now they are using the latest context, which brings in context changes that lead to regression test failures. I don't know how to force GA to do that in the short term, so I'd propose using this PR of Brett's to ignore the context in the short term, while planning to revert that change after we get the GA to always use a particular context.

ddavis-stsci · 2024-08-12T14:45:56Z

I'd suggest that we keep the crs changes and punt on the crds context issue. I think the crds context check needs more discussion.

braingram · 2024-08-12T16:24:31Z

I'd suggest that we keep the crs changes and punt on the crds context issue. I think the crds context check needs more discussion.

I dropped the crds context ignore. The regression tests will continue to fail due to the new crds context.

ddavis-stsci

LGTM

braingram added the no-changelog-entry-needed label Aug 8, 2024

github-actions bot added jump testing labels Aug 8, 2024

braingram changed the title ~~make crs bigger for new pars file~~ make crs bigger for new pars file for jump unit tests Aug 8, 2024

braingram marked this pull request as ready for review August 8, 2024 19:13

braingram requested a review from a team as a code owner August 8, 2024 19:13

braingram requested review from schlafly and ddavis-stsci August 8, 2024 20:14

github-actions bot added the regression_testing label Aug 8, 2024

braingram changed the title ~~make crs bigger for new pars file for jump unit tests~~ make crs bigger for new pars file for jump unit tests, add roman.meta.ref_file.crds.context_used to list of ignored paths for compare_asdf Aug 9, 2024

braingram force-pushed the bigger_crs branch from edef4f6 to 0d204b6 Compare August 12, 2024 16:20

braingram changed the title ~~make crs bigger for new pars file for jump unit tests, add roman.meta.ref_file.crds.context_used to list of ignored paths for compare_asdf~~ make crs bigger for new pars file for jump unit tests Aug 12, 2024

make crs bigger for new pars file

0c13448

braingram force-pushed the bigger_crs branch from 0d204b6 to 0c13448 Compare August 12, 2024 16:23

ddavis-stsci approved these changes Aug 12, 2024

View reviewed changes

ddavis-stsci merged commit 5e7e4fd into spacetelescope:main Aug 12, 2024
30 checks passed

braingram deleted the bigger_crs branch August 12, 2024 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make crs bigger for new pars file for jump unit tests #1356

make crs bigger for new pars file for jump unit tests #1356

braingram commented Aug 8, 2024 •

edited

Loading

codecov bot commented Aug 8, 2024 •

edited

Loading

schlafly commented Aug 8, 2024

braingram commented Aug 8, 2024 •

edited

Loading

braingram commented Aug 8, 2024

schlafly commented Aug 9, 2024

schlafly commented Aug 9, 2024

braingram commented Aug 9, 2024 •

edited

Loading

schlafly commented Aug 9, 2024

ddavis-stsci commented Aug 9, 2024

schlafly commented Aug 9, 2024

ddavis-stsci commented Aug 12, 2024

braingram commented Aug 12, 2024 •

edited

Loading

ddavis-stsci left a comment

make crs bigger for new pars file for jump unit tests #1356

make crs bigger for new pars file for jump unit tests #1356

Conversation

braingram commented Aug 8, 2024 • edited Loading

codecov bot commented Aug 8, 2024 • edited Loading

Codecov Report

schlafly commented Aug 8, 2024

braingram commented Aug 8, 2024 • edited Loading

braingram commented Aug 8, 2024

schlafly commented Aug 9, 2024

schlafly commented Aug 9, 2024

braingram commented Aug 9, 2024 • edited Loading

schlafly commented Aug 9, 2024

ddavis-stsci commented Aug 9, 2024

schlafly commented Aug 9, 2024

ddavis-stsci commented Aug 12, 2024

braingram commented Aug 12, 2024 • edited Loading

ddavis-stsci left a comment

Choose a reason for hiding this comment

braingram commented Aug 8, 2024 •

edited

Loading

codecov bot commented Aug 8, 2024 •

edited

Loading

braingram commented Aug 8, 2024 •

edited

Loading

braingram commented Aug 9, 2024 •

edited

Loading

braingram commented Aug 12, 2024 •

edited

Loading