Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to extend gcm_regress.j to test NUM_READERS & NUM_WRITERS #571

Closed
wmputman opened this issue Feb 8, 2024 · 2 comments · Fixed by #573
Closed

Need to extend gcm_regress.j to test NUM_READERS & NUM_WRITERS #571

wmputman opened this issue Feb 8, 2024 · 2 comments · Fixed by #573
Assignees

Comments

@wmputman
Copy link
Contributor

wmputman commented Feb 8, 2024

For my c24 L72 regression tests I added to the layout test a change of num_readers and num_writers:

./strip AGCM.rc
set oldstring = cat AGCM.rc | grep "^ *NX:"
set newstring = "NX: ${test_NX}"
/bin/mv AGCM.rc AGCM.tmp
cat AGCM.tmp | sed -e "s?$oldstring?$newstring?g" > AGCM.rc

set oldstring = cat AGCM.rc | grep "^ *NY:"
set newstring = "NY: ${test_NY}"
/bin/mv AGCM.rc AGCM.tmp
cat AGCM.tmp | sed -e "s?$oldstring?$newstring?g" > AGCM.rc

set oldstring = cat AGCM.rc | grep "^ *NUM_WRITERS:"
set newstring = "NUM_WRITERS: 6"
/bin/mv AGCM.rc AGCM.tmp
cat AGCM.tmp | sed -e "s?$oldstring?$newstring?g" > AGCM.rc

set oldstring = cat AGCM.rc | grep "^ *NUM_READERS:"
set newstring = "NUM_READERS: 6"
/bin/mv AGCM.rc AGCM.tmp
cat AGCM.tmp | sed -e "s?$oldstring?$newstring?g" > AGCM.rc

This leads to restarts that fail layout regression as the file sizes differ. But... running the following from login shells does nt show any data differences:

KM_v11.4.0/regress> $BASEDIR/Linux/bin/nccmp -dmfgBq moist_internal_checkpoint.20000415_030000.2 moist_internal_checkpoint.20000415_030000.4

KM_v11.4.0/regress> $BASEDIR/Linux/bin/nccmp -dmfgBq fvcore_internal_checkpoint.20000415_030000.2 fvcore_internal_checkpoint.20000415_030000.4

Even though the file sizes differ:

KM_v11.4.0/regress> ls -l moist_internal_checkpoint.20000415_030000.2 moist_internal_checkpoint.20000415_030000.4
-rw-r--r-- 1 wputman s1062 9986760 Feb 8 17:03 moist_internal_checkpoint.20000415_030000.2
-rw-r--r-- 1 wputman s1062 9986897 Feb 8 17:08 moist_internal_checkpoint.20000415_030000.4
KM_v11.4.0/regress> ls -l fvcore_internal_checkpoint.20000415_030000.2 fvcore_internal_checkpoint.20000415_030000.4
-rw-r--r-- 1 wputman s1062 13985375 Feb 8 17:03 fvcore_internal_checkpoint.20000415_030000.2
-rw-r--r-- 1 wputman s1062 13985053 Feb 8 17:08 fvcore_internal_checkpoint.20000415_030000.4

Aside from the oddity of nccmp failing in the gym_regress.j script but not on the login node, we should include these tests somehow in the regression testing

@mathomp4 mathomp4 self-assigned this Feb 9, 2024
@bena-nasa
Copy link
Collaborator

bena-nasa commented Feb 12, 2024

@wmputman
I would not worry about if the files differer in size or if "cmp" says two files are different. If a tool like cdo or nccmp says the data is the same that's all that matters. Indeed I've have several instances over the years where users have come to me claiming that cmp says the files are different, but then if you use cdo or nccmp it can detect no difference.

I've tried to reproduce you nccmp issue so far with no luck. I've been using the latest model release, v11.5.1 and running a c24 experiment multiple times, using a different number of writers for in each execution.
While cmp says the files are different, nccmp (and cdo) says they are the same, on sle15 login nodes, sles15 compute nodes, and sles12 login nodes
Could you perhaps give more details what tag you were using, what stack etc? I've been unable to reproduce any sort of issue so far at c24.

@mathomp4
Copy link
Member

@wmputman I've added the functionality in #573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants