Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONUS RRM stops working: domain and initial condition files mismatch #1899

Closed
wlin7 opened this issue Nov 15, 2017 · 12 comments · Fixed by #2163
Closed

CONUS RRM stops working: domain and initial condition files mismatch #1899

wlin7 opened this issue Nov 15, 2017 · 12 comments · Fixed by #2163

Comments

@wlin7
Copy link
Contributor

wlin7 commented Nov 15, 2017

CONUS RRM (conusx4v1 grid) stops working with the current master.

During land initialization, it stopped with the following error message

check_dim ERROR: mismatch of input dimension 39172 with expected value 39189 for variable gridcell

This was first found when trying to use previously created CAPT ICs for short hindcast runs. It is verified that the same problem exists with standard configuration of conusx4v1 for free runs, which use finidat='clmi.ICRUCLM45.conusx4v1.74e105b.clm2.r.0021-01-01-00000.nc'.

Investigation so far suggests that the mismatch is between the domain file and the finidat file. Looks like the current default finidat was generated based on an older domain file (i.e..,domain.lnd.conusx4v1_tx0.1v2.141022.nc for ATM_DOMAIN_FILE and LND_DOMAIN_FILE), while the current default using domain.lnd.conusx4v1_tx0.1v2.161129.nc.

I have lost track when these domain files were updated. If there are good reasons to use the newer domain files (for OCN and ICE as well), then finidat files have to be regenerated. It would be needed for both F2000 and F1850.

Note that the current code can still run if resetting finidat to empty -- namely, arbitrary initialization. I am testing to reset the domain files in env_run.xml, in order to use existing IC (restart) files. I think it will work; the job is pending on cori-knl.

Attentions: @brhillman , @bishtgautam , @tangq , @eroesler , @mt5555

@oksanaguba
Copy link
Contributor

This is probably related to PR1581.

@mt5555
Copy link
Contributor

mt5555 commented Nov 15, 2017

PR #1581 did change the domain files and regenerated the finidat files:

5000116

for both F2000 and F1850.

Based on the filenames, these look like CLM4.5 files, so is it possible they would work with a ACME v1 configuration, but not with F2000 and F1850 (which use CLM4.0)?

@mt5555
Copy link
Contributor

mt5555 commented Nov 15, 2017

this PR has more relevant information:

#1612

@oksanaguba
Copy link
Contributor

oksanaguba commented Nov 15, 2017

It seems that pr #1269 reset finidat values that were set in pr #1612. So should i create a new PR? Note that a file for 1850 was never generated, is it needed by someone now?

21ca187#diff-03d6e4f19e3daae8577f0de332a0c136

@wlin7
Copy link
Contributor Author

wlin7 commented Nov 15, 2017

Thanks @oksanaguba , @mt5555 . A few comments below.

  1. F2000 and F1850 with ACME v1 are also using CLM4.5
  2. My repo includes PR# 1612. However, as also found by @oksanaguba , the new finidat disappears from namelist_default_clm4_5.xml. It would be working if PR# 1612 remains intact.
  3. Indeed the area correction (PR# 1581) improves the consistency between atm and lnd very substantially. With previous version, the max area difference is 0.363e-5. After the correction, it is 0.828e-13.
  4. For tracking purpose, if ever to use the older domain files, like in my case to use existing CAPT ICs, EPS_AAREA also needs to be reset to 9e-6. Current value is 9e-7.

@wlin7
Copy link
Contributor Author

wlin7 commented Nov 15, 2017

BTW, @oksanaguba , a new PR is definitely needed. Please go ahead. You may assign to me to review and integrate. A new finidat for F1850 will eventually be needed, but not at the moment.

@rljacob
Copy link
Member

rljacob commented Nov 15, 2017

In the PR to fix this, add a test that uses the conusx4v1 grid.

@mt5555
Copy link
Contributor

mt5555 commented Nov 16, 2017

Fixing this (and adding a test) will have to be done by someone on E3SM, or CMDV-A projects. As it was broken by PR #1269, re-assigining to @brhillman

update: the commit log claims "PR #1269", but that appears to be a typo as that PR doesn't seem to contain the comit of interest.

update2: I cant find the actual PR

update3: The conus grid is large, 20K elements. A conus test should be short, and go in the new suite for high-res tests?

@mt5555 mt5555 assigned brhillman and unassigned oksanaguba Nov 16, 2017
@oksanaguba
Copy link
Contributor

I used github for history of file components/clm/bld/namelist_files/namelist_defaults_clm4_5.xml to see where the old values were brought back. Following this link it was merge of that PR but maybe something else went wrong? sorry if i pointed to wrong PR or wrong person.

21ca187#diff-03d6e4f19e3daae8577f0de332a0c136

@mt5555
Copy link
Contributor

mt5555 commented Nov 16, 2017

update:

There was a typo in the PR, correct number is PR #1629

@brhillman 's branch was ok, but it looks like there was a conflict and the integrator ( @rljacob ) resolved it incorrectly, reverting the old finidat file.

Should be fixed by either @rljacob or @brhillman ?

@rljacob
Copy link
Member

rljacob commented Nov 16, 2017

What's a test to verify the fix?

@mt5555
Copy link
Contributor

mt5555 commented Nov 17, 2017

this might work:

SMS_Ln1_P64x2.conusx4v1_conusx4v1.FC5AV1C-L

it's a big problem, so you might need to use more nodes.

fmyuan referenced this issue in fmyuan/E3SM Nov 28, 2017
Can test single-submit on chama in addition to skybridge

Forgot to commit this in #1898 , this addresses why I didn't see this problem when I did the num_nodes work.

Test suite:
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes [CIME Github issue #]

User interface changes?: N

Update gh-pages html (Y/N)?:N

Code review: None
brhillman added a commit that referenced this issue Mar 13, 2018
Fix finidat setting for CONUS RRM. finidat was pointing to the wrong
land initial condition for CLM4.5 for CONUS sim_year 2000, causing CONUS
configurations to fail. Fixes #1899 for sim_year 2000, but an updated
initial condition does not exist for sim_year 1850, so F1850 compsets
will probably still fail with CONUS.
wlin7 added a commit that referenced this issue Mar 14, 2018
Fix CONUS RRM configuration

Fix two issues with the CONUS RRM configuration that prevented configuring a case with CONUS. First, the domain files were not specified for the CONUS grid in config_grids.xml, so hgrid ended up remaining UNSET, causing set_horiz_grid to fail with all components having grid values UNSET. This PR puts the domain file specification back into config_grids.xml for CONUS, and fixes #2147. Second, an older land initial condition (finidat) file was being used for sim year 2000 runs, which was incompatible with recent changes, causing runs to fail. This PR replaces finidat for sim year 2000 with the updated version.  Fixes #1899

This PR also adds a test suite for RRM grids.

[BFB]
wlin7 added a commit that referenced this issue Mar 14, 2018
Fix CONUS RRM configuration

Fix two issues with the CONUS RRM configuration that prevented configuring a case with CONUS. First, the domain files were not specified for the CONUS grid in config_grids.xml, so hgrid ended up remaining UNSET, causing set_horiz_grid to fail with all components having grid values UNSET. This PR puts the domain file specification back into config_grids.xml for CONUS, and fixes #2147. Second, an older land initial condition (finidat) file was being used for sim year 2000 runs, which was incompatible with recent changes, causing runs to fail. This PR replaces finidat for sim year 2000 with the updated version.  Fixes #1899

This PR also adds a test suite for RRM grids.

[BFB]
jgfouca pushed a commit that referenced this issue Apr 2, 2018
Fix CONUS RRM configuration

Fix two issues with the CONUS RRM configuration that prevented configuring a case with CONUS. First, the domain files were not specified for the CONUS grid in config_grids.xml, so hgrid ended up remaining UNSET, causing set_horiz_grid to fail with all components having grid values UNSET. This PR puts the domain file specification back into config_grids.xml for CONUS, and fixes #2147. Second, an older land initial condition (finidat) file was being used for sim year 2000 runs, which was incompatible with recent changes, causing runs to fail. This PR replaces finidat for sim year 2000 with the updated version.  Fixes #1899

This PR also adds a test suite for RRM grids.

[BFB]
wlin7 pushed a commit that referenced this issue Feb 26, 2023
…heck-testing-cost

Automatically Merged using E3SM Pull Request AutoTester
PR Title: EAMxx: reduce cost of memcheck build
PR Author: bartgol
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants