Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JRA55 default forcing dataset #533

Merged
merged 7 commits into from
Dec 23, 2020

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Nov 24, 2020

This is a work in progress
@eclare108213 edited to add list of to-do items below.

PR checklist

  • Short (1 sentence) summary of your PR:
    Update CICE to make JRA55 default forcing set

  • Developer(s):
    rallard77, apcraig

  • Suggest PR reviewers from list in the column to the right.

  • Please copy the PR test results link or provide a summary of testing completed below.
    First step was adding all the new jra55 tests and making sure everything passed. See below for more details,
    https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#510221b8e9e59b497a5a92bdd4872cb02aae396b

  • How much do the PR code changes differ from the unmodified code?

    • bit for bit
    • different at roundoff level
    • more substantial, JRA55 new forcing default
  • Does this PR create or have dependencies on Icepack or any other models?

    • Yes
    • No
  • Does this PR add any new test cases?

    • Yes, test cases are being redefined and will change, some new tests added
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)

    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:

See also #530

First Step, add all the new jra55 tests, 510221b

jra55 tests added as follows for different grids
gx3 uses jra55_gx3
gx1 uses jra55_gx1
tx1 is already using jra55 by default (no changes to tests required)
gbox* uses internal forcing, no jra55 data (no changes to tests required)

remove ocn_data_dir='default' and bgc_data_dir='default' in set_nml.jra55*
files because it causes the bgc tests to fail. the bgc tests work with WOA bgc
forcing data with jra55 physics forcing data.

iobinary tests must use the old gx3 binary forcing, the jra55_gx3 netcdf data
causes an abort because iobinary is tested with netcdf turned off. the old
gx3 data is NCAR_bulk in binary format.

alt01 with jra55 uses the alt01a setting. the difference is that
use_leap_years is set to false in alt01a. It's set to true by default
in the jra55 datasets.

To-do (copied from conversation below):

  • Change default emissivity to 0.985
  • Change default nblyr to 1

Notes from global/ (these are hemispheric values produced by print_global=T):

  • We should turn on the brine tracer for these tests. When it's off, maybe we shouldn't print this diagnostic.
  • Should there be rain water in the Arctic in Jan/Feb/Mar?
  • The arwt tot mass, energy, salt diagnostics are wrong.

Notes from point/ (these are values for particular lat/lon points produced by print_point=T):

  • Arctic SST is wrong: large spikes, 2 lines, y axis values are wrong magnitude. This might just be a diagnostic issue.
  • Why is the Antarctic shortwave radiation sum = 0 in mid summer?
  • Why is the snow change always 0?

The following issue may be more about the physical representation than diagnostic calculations. From the point plots:

  • Why does gx1 Antarctic heat used go to an approximately constant (barely decreasing), nonzero value in mid-Feb? The area fraction ~0; bottom, top & lateral melt stops but thickness continues to decrease, and sensible, latent heat, outward longwave flux are all nonzero. The gx3 behavior is the opposite - Antarctic heat used is larger after mid-Feb, and the various terms remain nonzero. It could just be that the grid cells we are looking at are experiencing different conditions and hit the ice/no-ice threshold differently, but they should be fairly close to each other (lat/lon) so I wouldn't expect the conditions to be that different.

@eclare108213
Copy link
Contributor

cice.setup: ERROR, /turquoise/usr/projects/climate/eclare/CICE.v6.1/apcraig-jra55_update/configuration/scripts/options/set_[nml,env].alt01a not found

@apcraig
Copy link
Contributor Author

apcraig commented Nov 25, 2020

cice.setup: ERROR, /turquoise/usr/projects/climate/eclare/CICE.v6.1/apcraig-jra55_update/configuration/scripts/options/set_[nml,env].alt01a not found

sorry about that. I have added that file, it should be fixed now.

@eclare108213
Copy link
Contributor

That was quick! Thanks.

@eclare108213
Copy link
Contributor

I'm starting to compare the time series plots for the base_suite tests. I'll also look through the log files for inconsistent settings. Should I look at other test suites?

@apcraig
Copy link
Contributor Author

apcraig commented Nov 25, 2020

base_suite is the main one. The other suites are largely either subsets of the base_suite or technical testing that use mainly the default namelist for science.

@eclare108213
Copy link
Contributor

I would like to suggest that we change the default value of emissivity from 0.95 to 0.985, when we switch to JRA forcing. It's more realistic. Any reason not to do this (other than making comparing the output more painful)?

@apcraig
Copy link
Contributor Author

apcraig commented Nov 25, 2020

We can certainly change the value of emissivity. I will make that change a separate step and test it.

@eclare108213
Copy link
Contributor

I think we should also set nblyr=1 as the default (it appears to be 7 now)

@apcraig
Copy link
Contributor Author

apcraig commented Nov 26, 2020

7bf885b makes jra55 the defaults by updating the gx1 and gx3 defaults, getting rid of the set_nml.jra55* options, and adding gx3ncarbulk and gx1coreii options. The test names were changed to reflect the new options, and all this was compared to the prior baseline using links to address the test renaming, and everything is confirmed to be bit-for-bit, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#7bf885bb2acfbaa6dd6d1d915d730bd0f0368910. There are 4 new tests added in the new test lists that did not have baselines.

This commit leverages "zz" option names to make renaming easier. The next step will be to remove the zz and rerun the test suite to establish new baselines.

@eclare108213
Copy link
Contributor

Base_suite results on badger:

328 measured results of 328 total results
321 of 328 tests PASSED
1 of 328 tests PENDING
0 of 328 tests MISSING data
6 of 328 tests FAILED

The failing tests are the usual ones calling for 40 pes. The pending test also failed initially:

PEND badger_intel_restart_gbox128_2x2_boxadv_short run

I tried to rerun it and killed it after waiting a while - I don't understand what's wrong. It seems to be hanging and eventually timing out, although it initially did start to run and write data to the diagnostic output file.

@eclare108213
Copy link
Contributor

I have made time series plots for all of the global and point diagnostics in all of the base_suite runs, for both the current forcing and JRA55. They mostly look okay to me, although there are a few things that need to be looked at more closely and possible fixed, particularly for the alt0* runs. I'll come back to those.

Meanwhile, I'm hoping that the testing/analysis team can answer or address the items below. Attached here are the plots for the 90-day JRA55 runs for both gx3 and gx1.
90day.tar.gz

Notes from global/ (these are hemispheric values produced by print_global=T):

  • We should turn on the brine tracer for these tests. When it's off, maybe we shouldn't print this diagnostic.
  • Should there be rain water in the Arctic in Jan/Feb/Mar?
  • The arwt tot mass, energy, salt diagnostics are wrong.
  • Spikes in the error plots could help us locate conservation leaks.

Notes from point/ (these are values for particular lat/lon points produced by print_point=T):

  • Arctic SST is wrong: large spikes, 2 lines, y axis values are wrong magnitude. This might just be a diagnostic issue.
  • Why is the Antarctic shortwave radiation sum = 0 in mid summer?
  • Why is the snow change always 0?

The following issue may be more about the physical representation than diagnostic calculations. From the point plots:

  • Why does gx1 Antarctic heat used go to an approximately constant (barely decreasing), nonzero value in mid-Feb? The area fraction ~0; bottom, top & lateral melt stops but thickness continues to decrease, and sensible, latent heat, outward longwave flux are all nonzero. The gx3 behavior is the opposite - Antarctic heat used is larger after mid-Feb, and the various terms remain nonzero. It could just be that the grid cells we are looking at are experiencing different conditions and hit the ice/no-ice threshold differently, but they should be fairly close to each other (lat/lon) so I wouldn't expect the conditions to be that different.

@apcraig
Copy link
Contributor Author

apcraig commented Nov 30, 2020

@eclare108213, if the 40 pe cases fail, would it be helpful to limit the number of pes used in testing? What should that number be? 32? 24? 16?

Regarding the diagnostics, let me know if there is anything in particular you want me to look at. I hope we can fix the diagnostics as well as the arwt computation asap?

@eclare108213
Copy link
Contributor

It's okay with me to keep using the 40-pe tests, just knowing they'll fail on badger. It seems like we added something to the scripts for this kind of machine limitation, but if so, obviously I haven't been using it.

Have you run into any issues with the intel_restart_gbox128_2x2_boxadv_short test? Since it's failed repeatedly for me on this PR, I don't think it's a fluke, but it could be a machine-specific problem.

If you (or anyone) would like to tackle the diagnostic issues, go right ahead... They're a bit lower on the totem pole for me, personally, but I agree we ought to fix them ASAP. I'll take a look at the alt0* tests next, then I plan to revert the upwind code (unless someone else wants to do that...) :)

@apcraig
Copy link
Contributor Author

apcraig commented Nov 30, 2020

I'll take care of the unwind code, #508.

@rallard77
Copy link
Contributor

rallard77 commented Nov 30, 2020 via email

@apcraig
Copy link
Contributor Author

apcraig commented Nov 30, 2020

I have updated the PR again only removing the "zz" naming option (08d6f70) to generate a new suite of baselines to continue the process moving forward. Those baselines are https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#08d6f70c6f4ee4994bf811552affc2992a74616e. All tests pass but many of the regression tests are missing because the names have changed.

From this point, we could merge the PR and then as a separate PR update some of the tests. Or we could continue to allow this PR to evolve and update tests here before merging.

@apcraig
Copy link
Contributor Author

apcraig commented Nov 30, 2020

Actually, at least one more step is required which is to remove the "extra" old ncarbulk and coreii tests. But I think that could be the last step before we merge.

@apcraig
Copy link
Contributor Author

apcraig commented Nov 30, 2020

Again, so far, all I've done is carefully changed the default forcing to JRA55, add optional gx1 coreii and gx3 ncarbulk options, and run full test suites on 3 compilers on cheyenne to make sure it's all working properly.

@eclare108213
Copy link
Contributor

@eclare108213
Copy link
Contributor

arwt_tot_energy_badger_intel_smoke_gx1_24x1_jra55_gx1_2008_medium_run90day jrabase3

@eclare108213
Copy link
Contributor

arwt_swdn_error_badger_intel_smoke_gx1_24x1_jra55_gx1_2008_medium_run90day jrabase3

@eclare108213
Copy link
Contributor

shortwave_radiation_sum_badger_intel_smoke_gx1_24x1_jra55_gx1_2008_medium_run90day jrabase3

@eclare108213
Copy link
Contributor

eclare108213 commented Dec 7, 2020

I added issue #538 describing some of the plotting problems.
Edit with more information: I ran the 90-day tests with the original data but otherwise as in the JRA55 90-day tests, and they all have the same plotting problems, so that is not data-specific. It appears to be a problem with the script - possibly with my changes to the script to plot variables that aren't usually included.

@eclare108213
Copy link
Contributor

I checked the sst values in the alt02 and alt05 tests, which looked like they were exactly 0. They aren't - there's a spike at the end of the time series which is making the y axis very large. However, the values appear to be constant (-1.90458264992426463) which still didn't make sense, since the mixed layer calculation is turned on. It's right, though - the mixed layer routine is producing frazil ice of different amounts through the run; sst is pegged at a constant value because the salinity is constant, and the run isn't long enough to see warmer sst in this grid cell. So I think the alt02 and alt05 tests are behaving as expected, at least for the mixed layer calculation.

@rallard77
Copy link
Contributor

cice.runlog.201210-072851.gz
I ran the alt05 test for a full year. The SST results appear reasonable. The SST values change during the warmer months...

@eclare108213
Copy link
Contributor

Thanks @rallard77, that's good. I've added the list of to-do items above to the top of the PR, as a checklist.

jra55 tests for different grids
  gx3 uses jra55_gx3
  gx1 uses jra55_gx1
  tx1 is already using jra55 by default (no changes to tests required)
  gbox* uses internal forcing, no jra55 data (no changes to tests required)

remove ocn_data_dir='default' and bgc_data_dir='default' in set_nml.jra55*
files because it causes the bgc tests to fail.  the bgc tests work with WOA bgc
forcing data with jra55 physics forcing data.

iobinary tests must use the old gx3 binary forcing, the jra55_gx3 netcdf data
causes an abort because iobinary is tested with netcdf turned off.  the old
gx3 data is NCAR_bulk in binary format.

alt01 with jra55 uses the alt01a setting.  the difference is that
use_leap_years is set to false in alt01a.  It's set to true by default
in the jra55 datasets.
set_nml.gx3 and set_nml.gx1 are now JRA55 by default
get rid of set_nml.jra55* files
add set_nml.gx3ncarbulk and set_nml.gx1coreii which invokes prior forcing datasets
modify the test suites to do identical tests but with new options, compare
  to prior implementation by creating a renamed regression baseline via
  a script and links to generate the new names.  Use zz* for the new set_nml
  options temporarily just to make it easy to generate the new links
  (the zz ensures the new options are last in the list by alphabet)
chmod -x all set_* files for consistency
this is done as a clean separate step so new baselines can be established to
continue to evolve the test cases and allow regression testing
@apcraig
Copy link
Contributor Author

apcraig commented Dec 22, 2020

I have merged in the latest master to this branch and retested, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#97b11eac6673ee2fc03f893fbb03e28847bb5f37. Comparison is to the prior set of results for this branch. There are lots of non bit-for-bit results because Icepack changed answers in the 1.2.4 release. But I think the results are all as expected. The transition from the old version to the new version was done carefully.

I would like to propose we consider merging this soon and then creating a separate issue to continue to "fix" coverage and settings that have been identified. I think it would be helpful to start testing the new test suites and allow continued development to work from the JRA55 default. Even #548 is using the old forcing data to test the zsal feature which is probably not what we want in the longer term.

The one things we will do (either in this PR or in another PR soon) is remove a bunch of the old forcing tests. At the current time, we are basically doing double the testing, all with old forcing and all with new forcing. Should I remove most of those tests now?

@apcraig apcraig marked this pull request as ready for review December 22, 2020 17:31
@apcraig
Copy link
Contributor Author

apcraig commented Dec 22, 2020

I just committed a change to remove most of the old forcing test cases. So we're back to more typical size test suite (150 from 274), https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#97b11eac6673ee2fc03f893fbb03e28847bb5f37

@apcraig
Copy link
Contributor Author

apcraig commented Dec 23, 2020

@eclare108213, are you open to merging this PR now/soon? The focus here is on refactoring the test suites and making JRA55 the default. Once this is merged, I feel comfortable that we can carry out concurrent development on multiple tasks including tasks associated with further validating the JRA55 output. I worry that if this PR is delayed, that concurrent development will happen anyway and we'll end up with a bit of a divergence in this branch and other tasks. I think the only reasons to delay a merge is if we question whether JRA55 should be the default forcing in the short term.

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why set_env.box2001, set_nml.box2001 and set_nml.gbox80 say "Empty file" in the Files Changed tab. Is that just a github bookkeeping weirdness?

What's the reason for having alt01a now? I know that it uses leap days now and didn't before - is that the only difference?

I'm generally okay with merging this PR soon and continuing the work elsewhere.

@apcraig
Copy link
Contributor Author

apcraig commented Dec 23, 2020

The empty files just change permissions on the file to make them consistent with the rest of the files. There are no internal changes.

The alt01a is a good catch. That was needed when both old and new forcing was being tested. Let me rename that file to alt01, update the test suites, rerun the suite, and then commit the change. I'll let you know when I've fixed that.

@apcraig
Copy link
Contributor Author

apcraig commented Dec 23, 2020

@apcraig apcraig mentioned this pull request Dec 23, 2020
@apcraig
Copy link
Contributor Author

apcraig commented Dec 23, 2020

I moved the checklist to #249

@apcraig apcraig merged commit 2b10c11 into CICE-Consortium:master Dec 23, 2020
@apcraig apcraig deleted the jra55_update branch August 17, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants