Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CDAT-migrated E3SM Diags #651

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

forsyth2
Copy link
Collaborator

Issue resolution

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

  • Enable zppy to call the CDAT-migrated E3SM Diags

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request introduces an important feature or bug fix that we must test often. I have updated the weekly-test configuration files, not just a "min-case" one.
  • Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy/conda, not just an import statement.

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 10, 2024

To do:

  • (currently running) test just e3sm diags on all available sets: zppy -c tests/integration/generated/test_min_case_e3sm_diags_comprehensive_v3_chrysalis.cfg
  • Run the full weekly test suite
  • Merge this PR

Comment on lines 12 to 16
output = "#expand user_output#zppy_weekly_comprehensive_v3_output/#expand unique_id#/#expand case_name#"
partition = "#expand partition_short#"
qos = "#expand qos_short#"
www = "#expand user_www#zppy_weekly_comprehensive_v3_www/#expand unique_id#"
years = "1985:1989:2",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output and www need updated names!

@forsyth2
Copy link
Collaborator Author

Currently running full weekly test suite (doing this pre-merge because the CDAT migration was a large change, so there's a higher chance of breaking main).

$ python tests/integration/utils.py 
CFG FILES HAVE BEEN GENERATED FROM TEMPLATES WITH THESE SETTINGS:
UNIQUE_ID=test-346-20241210
unified_testing=False
diags_environment_commands=source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_main_20241209
global_time_series_environment_commands=source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_dev_weekly_20241122
e3sm_to_cmip_environment_commands=
environment_commands=
Reminder: `e3sm_to_cmip_environment_commands=''` => the environment of the `ts` task will be used
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Runs 1st part of bundles cfg

@forsyth2
Copy link
Collaborator Author

Continuing testing from previous comment

Continuing testing gives:

zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Runs 2nd part of bundles cfg

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-346-20241210/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-346-20241210/v2.LR.historical_0201/post/scripts
grep -v "OK" *status

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-346-20241210/v3.LR.historical_0051/post/scripts
grep -v "OK" *status

# Run integration tests
cd ~/ez/zppy
pytest tests/integration/test_*.py

That gives:

======================================================================= short test summary info ========================================================================
FAILED tests/integration/test_bash_generation.py::test_bash_generation - assert 256 == 0
FAILED tests/integration/test_weekly.py::test_comprehensive_v2_images - AssertionError
FAILED tests/integration/test_weekly.py::test_comprehensive_v3_images - AssertionError
FAILED tests/integration/test_weekly.py::test_bundles_images - AssertionError
============================================================== 4 failed, 10 passed in 1505.03s (0:25:05) ===============================================================

Bash generation test

The FAILED tests/integration/test_bash_generation.py::test_bash_generation - assert 256 == 0 line is benign. The changes are specifically because the e3sm_diags.bash template gets updated in this PR; i.e., these diffs are expected.

Image check failures

There's a large number of image check diffs in subdirectories of:

But as far as I can tell, the errors are mostly benign. Example errors:

  • Many involve actual having "RMSE" and "CORR" further to the left than in expected.
  • Slight differences in text shape/size -- e.g., actual and expected
  • Slight differences in contours like actual and expected
  • Differences in lat/lon labels -- actual and expected

Some errors seem more concerning:

Conclusions

@chengzhuzhang @tomvothecoder What's the best path forward here? None of the grep -v "OK" *status checks showed errors. That said, there are simply too many diffs in the image check tests to manually check them all. Most of the errors seem benign and likely arise from variations in matplotlib, but I found at least one instance of a more concerning change (noted above).

Do we accept these diffs, merge this PR, and update the expected images? Or do these need to be investigated more thoroughly?

@forsyth2 forsyth2 marked this pull request as ready for review December 10, 2024 19:58
@tomvothecoder
Copy link
Collaborator

@chengzhuzhang @tomvothecoder What's the best path forward here? None of the grep -v "OK" *status checks showed errors. That said, there are simply too many diffs in the image check tests to manually check them all. Most of the errors seem benign and likely arise from variations in matplotlib, but I found at least one instance of a more concerning change (noted above).

Do we accept these diffs, merge this PR, and update the expected images? Or do these need to be investigated more thoroughly?

I think we should expect that most of the differences will be benign. The underlying results from the dev branch won't exactly match the CDAT codebase, which can produce diffs.

Completely different stats and colors: actual and expected

However, any formatting differences (e.g., labels for axes) or major differences like the one you mentioned above should be investigated. I will open a separate GitHub issue in e3sm_diags to investigate the diffs and address the ones that should be fixed.

Afterwards, we can update the expected results and you can re-run the zppy test again.

@forsyth2
Copy link
Collaborator Author

Thanks @tomvothecoder!

@forsyth2
Copy link
Collaborator Author

zppy & post-CDAT-migration e3sm_diags to-do list, for the 1/15 Unified release-candidate deadline:

@tomvothecoder @chengzhuzhang ^FYI for timeline planning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace CDAT
2 participants