Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip oceanConservation task in MPAS-Analysis #659

Merged
merged 1 commit into from
Jan 17, 2025

Conversation

xylar
Copy link
Contributor

@xylar xylar commented Jan 16, 2025

The conservation check is turned off in E3SM by default so we don't want to run the analysis by default. (Doing so would ultimately be harmless but there would be a confusing error message on setup.)

Issue resolution

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.

1. Does this do what we want it to do?

Objectives:

  • Prevent confusing error messages from MPAS-Analysis when the conservation check is not enabled in E3SM runs

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

Sorry, something went wrong.

The conservation check is turned off in E3SM by default so we
don't want to run the analysis by default.  (Doing so would
ultimately be harmless but there would be a confusing error message
on setup.)
@@ -2,4 +2,4 @@
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","enso_diags","qbo","diurnal_cycle","annual_cycle_zonal_mean","streamflow","zonal_mean_2d_stratosphere","aerosol_aeronet","aerosol_budget"

[mpas_analysis]
generate = 'all', 'no_landIceCavities', 'no_BGC', 'no_icebergs', 'no_min', 'no_max', 'no_sose', 'no_waves', 'no_eke', 'no_climatologyMapAntarcticMelt', 'no_regionalTSDiagrams', 'no_timeSeriesAntarcticMelt', 'no_timeSeriesOceanRegions', 'no_climatologyMapSose', 'no_woceTransects', 'no_soseTransects', 'no_geojsonTransects', 'no_oceanRegionalProfiles', 'no_hovmollerOceanRegions'
generate = 'all', 'no_landIceCavities', 'no_BGC', 'no_icebergs', 'no_min', 'no_max', 'no_sose', 'no_waves', 'no_eke', 'no_climatologyMapAntarcticMelt', 'no_regionalTSDiagrams', 'no_timeSeriesAntarcticMelt', 'no_timeSeriesOceanRegions', 'no_climatologyMapSose', 'no_woceTransects', 'no_soseTransects', 'no_geojsonTransects', 'no_oceanRegionalProfiles', 'no_hovmollerOceanRegions', 'no_oceanConservation'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2, normal Python config files allow multi-line config values. They just need to be indented more than the option name on earlier lines. Does your config parser allow that? These very long, single-line config options would certainly benefit from that if it does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure, I haven't tried that before, but I guess we can see.

@xylar
Copy link
Contributor Author

xylar commented Jan 16, 2025

@forsyth2, I would rather not run separate testing for such a trivial change. Can you cherry-pick this onto whatever other testing you will be running soon and make sure the MPAS-Analysis error during setup disappears?

@forsyth2
Copy link
Collaborator

Testing setup
cd ~/ez/zppy
git checkout test_zppy_weekly_20250114

This is the branch I used in #634 (comment)

conda activate zppy_dev_weekly_20250114
git fetch xylar

Go to https://github.com/E3SM-Project/zppy/pull/659/commits

# xylar:fix-conservation-check
git cherry-pick 629289e8f532fc2f66d6c9ccb3637fb43179c8c9
git log
# Now shows:
# Xylar's commit for #659
# My testing commit
# Update PR template (#658)
pip install .

Need to match tests/integration/utils.py as of #634 (reply in thread):

# UNIQUE_ID = "test_zppy_weekly_20250115"
#        "diags_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_pre_cdat_migration_plus_constraint",

# Run zppy to produce actual results to compare in the integration tests
python tests/integration/utils.py
# That prints:
# CFG FILES HAVE BEEN GENERATED FROM TEMPLATES WITH THESE SETTINGS:
# UNIQUE_ID=test_zppy_weekly_20250115
# unified_testing=False
# diags_environment_commands=source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_pre_cdat_migration_plus_constraint
# global_time_series_environment_commands=source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_dev_weekly_20250114
# environment_commands=
# Reminder: `environment_commands=''` => the latest E3SM Unified environment will be used

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_weekly_20250115/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
# No errors
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_weekly_20250115/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# global_time_series_1980-1990.status:WAITING 666748
# mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (1)
# mpas_analysis_ts_1980-1990_climo_1985-1990.status:WAITING 666747
cd cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test_zppy_weekly_20250115/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
# No errors

# So, we just need to rerun v2
cd ~/ez/zppy
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
sq
# USER     ACCOUNT NODE PARTITION JOBID   ST     REASON       TIME TIME_LIMIT NAME
# ac.forsy    e3sm    1   compute 667335  PD   Priority       0:00      30:00 mpas_analysis_ts_1980-1984_climo_1980-1984
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_weekly_20250115/v2.LR.historical_0201/post/scripts
rm global_time_series_1980-1990.status mpas_analysis_ts_1980-1990_climo_1985-1990.status
cd -
sq
# USER     ACCOUNT NODE PARTITION JOBID   ST     REASON       TIME TIME_LIMIT NAME
# ac.forsy    e3sm    1   compute 667335   R       None       1:11      30:00 mpas_analysis_ts_1980-1984_climo_1980-1984
# ac.forsy    e3sm    1   compute 667336  PD Dependency       0:00      30:00 mpas_analysis_ts_1980-1990_climo_1985-1990
# ac.forsy    e3sm    1     debug 667337  PD Dependency       0:00      30:00 global_time_series_1980-1990
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_weekly_20250115/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# global_time_series_1980-1990.status:WAITING 667337
# mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (2)
# mpas_analysis_ts_1980-1990_climo_1985-1990.status:WAITING 667336

emacs mpas_analysis_ts_1980-1984_climo_1980-1984.o667335 
# KeyError: 'config_am_conservationcheck_enable'
# ...
# Done.

# Seems like it finishes....
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_weekly_20250115/v2.LR.histo\
rical_0201/post/scripts/../analysis/mpas_analysis/ts_1980-1984_climo_1980-1984/logs
grep -in error *
# climatologyMapOHCAnomaly_plotANN_latlon_2000-10000m.log:35:MemoryError: std::bad_alloc
# complete.v2.LR.historical_0201.cfg:173:# Large datasets can encounter a memory error.  Specification of a maximum
# complete.v2.LR.historical_0201.cfg:174:# chunk size `maxChunkSize` can be helpful to prevent the memory error.  The
# complete.v2.LR.historical_0201.cfg:434:#    absolute_energy_error : Energy error
# complete.v2.LR.historical_0201.cfg:436:#    absolute_salt_error : Salt conservation error
# complete.v2.LR.historical_0201.cfg:443:plottypes = ['absolute_energy_error', 'absolute_salt_error', 'total_mass_change']
# complete.v2.LR.historical_0201.cfg:3696:# an error will be raised).  Note that this option is not usually practical for
# streamfunctionMOC_plotMOCClimatology.log:27:    raise ValueError('z array must not contain non-finite values '
# streamfunctionMOC_plotMOCClimatology.log:28:ValueError: z array must not contain non-finite values within the triangulation
# taskProgress.log:375:ERROR in task streamfunctionMOC: plotMOCClimatology.  See log file /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_weekly_20250115/v2.LR.historical_0201/post/scripts/../analysis/mpas_analysis/ts_1980-1984_climo_1980-1984/logs/streamfunctionMOC_plotMOCClimatology.log for details
# taskProgress.log:1368:ERROR in task climatologyMapOHCAnomaly: plotANN_latlon_2000-10000m.  See log file /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_weekly_20250115/v2.LR.historical_0201/post/scripts/../analysis/mpas_analysis/ts_1980-1984_climo_1980-1984/logs/climatologyMapOHCAnomaly_plotANN_latlon_2000-10000m.log for details
ls -l
# Everything was updated today, 2024-01-16

@xylar, it seems like we're still running into issues.

@xylar
Copy link
Contributor Author

xylar commented Jan 16, 2025

@forsyth2, thanks for testing. That's discouraging on both fronts. I see that your test directory has yesterday's date. Are you rerunning analysis in the same spot? If so, the log files get appended to so you may be seeing yesterday's errors.

@forsyth2
Copy link
Collaborator

I see that your test directory has yesterday's date. Are you rerunning analysis in the same spot?

Yeah, in the "Testing setup" section above, I cherry-pick this commit, re-install, then re-run zppy in the same directory (meaning not all the other tasks have to re-run, and if it worked, we could immediately proceed to the integration tests). It's not too too costly, time-wise, to start completely from scratch; the main thing is the E3SM Diags runs take a little over an hour, which is enough to be annoying.

If so, the log files get appended to so you may be seeing yesterday's errors.

Oh, then that may be the case. As noted, we do still have mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (2) though. But it looks like that's the memory allocation problem, which isn't addressed in this PR.

@xylar
Copy link
Contributor Author

xylar commented Jan 16, 2025

@forsyth2, can you at least verify that the zppy config file generated with your test now includes the no_oceanConservation item in the generate list? And that the KeyError: 'config_am_conservationcheck_enable' was not from today's test but left over from yesterdays?

@xylar
Copy link
Contributor Author

xylar commented Jan 16, 2025

I don't see it in mpas_analysis_ts_1980-1984_climo_1980-1984.settings so I think the test you ran unfortunately didn't actually test this code change. I think you have to rerun zppy itself to see the change.

@xylar
Copy link
Contributor Author

xylar commented Jan 16, 2025

I am somewhat suspicious that the memory issue may have to do with the MPAS-Analysis output having ended up in a bad sate because of several reruns. I would think the best idea would be to just do a clean rerun of MPAS-Analysis after rerunning zppy itself.

@forsyth2
Copy link
Collaborator

Yeah, I'm thinking that too. I'll do a fresh run.

@forsyth2
Copy link
Collaborator

Latest testing is working, see #634 (reply in thread). I think this is good to merge.

@forsyth2 forsyth2 merged commit ab5b75a into E3SM-Project:main Jan 17, 2025
4 checks passed
@forsyth2 forsyth2 deleted the fix-conservation-check branch January 17, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants