-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor significant portions of the codebase and refactor some land-atm handlers #213
Conversation
39f998d
to
f57f54b
Compare
VarHandler
classVarHandler
class and refactor lib.py
.
VarHandler
class and refactor lib.py
.VarHandler
class and refactor lib.py
VarHandler
class and refactor lib.py
lib.py
functions and add support for pfull/phalf variable handlers
lib.py
functions and add support for pfull/phalf variable handlers19c8092
to
02fd9a4
Compare
Moved the section below out of the description to make the description shorter. Technical Debt in this RepoI've noticed many Python anti-patterns and "not good" (bad) coding practices used in this repository. This has resulted in a lot of technical debt for new developers of this repo.
|
1. Refactor `lib.py` (now deleted) - Closes #190 - Replace `run_parallel()` and `run_serial()` with `__main__.py` -> `E3SMtoCMIP._run_parallel()` and `E3SMtoCMIP._run_serial()` - Updated var name `filepaths` to `vars_to_filepaths` for clarity (its a dict mapping var key to list of string paths) - Update args passed to handler method, remove unnecessary args - Add `try` and `except` statement for submitting `pool` jobs to maintain compatibility with MPAS variable handlers, which use different handler method arguments - Replace `handle_variables()`, `get_dimension_data()` and `load_axis()` with `VarHandler.cmorize()` - Delete `handle_simple()` -- _will be re-implemented from scratch in #130_ 3. Update `handler.py` - Update `VarHandler.cmorize()` - Refactored significantly -- extracted logic into smaller, maintainable private methods - Replaced `data` dictionary storing `xr.DataArrays` with `ds` (`xr.Dataset` object) - Distinguish CMOR usage with "cmor" string in function names and python variables - Add support for hybrid sigma variables: `pfull`, `phalf` - Add `VarHandler._cmor_write()` - Add support for CMORizing fixed-time variables - Separate PR for #217 - If time dimension exists, CMORize the variable with all time and time bound values with a single call to `cmor.write()` instead of looping over each time value index and CMORizing each slice -- **this should improve performance and removes the `tqdm` progress bar.** 4. Update `handlers.yaml` - Add `phalf` and `pfull` entries - Closes #115 - Delete `phalf.py` and `pfull.py` - Add `clcalispo` entry - Closes #218 - Delete `clcalipso.py` 5. Update `_formulas.py` - Update all function argument types and return types from `np.ndarray` to `xr.DataArray` - Add formula functions for `pfull` and `phalf` - Add `convert_units()` function, which handles 1-to-1 unit conversions -- replaces `default.default_handlers.write_data()` 6. Refactor `default.py` (now deleted) - Replaced by `VarHandler.cmorize()` and `_formulas.py.convert_units()` 7. Remove `cdms2` and `cdutil` from dependencies in `dev.yml` and `ci.yml`7 8. Clean up legacy code in `clisccp.py` - Separate PR for #218 9. Add `Makefile` for easy access to commonly used commands (e.g., building and installing package)
1472988
to
8d3ab3a
Compare
Hi @chengzhuzhang, this PR is finally ready for review! I performed thorough regression testing and everything checks out. The PR description includes the core changes that you might want to focus on. Thanks! |
def _reshape_single_time_bnd(self, ds): | ||
# TODO: Reshape time bounds if it exists and has a length of 1: https://github.com/E3SM-Project/e3sm_to_cmip/blob/8c818fcb6d51cc0555f62b8058eee539b69a9579/e3sm_to_cmip/lib.py#L674-L678C42 | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this TODO just in case it is needed. I didn't run into any issues where time bounds have a length of 1 and need to be reshaped like the old CDAT code linked in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some last self-review
name = "ps" | ||
|
||
# NOTE: This maintains the legacy name from pfull.py and phalf.py. I'm | ||
# not sure why this is done and what the difference is with "ps". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the note. I don't have a good idea of this either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder The PR looks good to me. Thanks for bringing in these nice changes and carefully tested the changes. The code base is much cleaner now.
Thanks @chengzhuzhang! I will merge this now. |
Overview
This PR was originally supposed only address #115, but I found the codebase difficult to work with so I decided to refactor it.
Regression Testing
TLDR: All of the datasets produced by this branch align with the
master
branch. The max relative differences between 5/109 datasets are insignificant and can be attributed to floating point rounding error produced by Xarray vs. CDAT, which is fine.Setup:
scripts/branch-regression-testing/115-cdat-refactor-test/115-end-to-end-script.sh
to compare againstmaster
branch datasets.scripts/branch-regression/115-comparison-notebook.ipynb
Results:
rtol=1e-7
(cl
clw
,cli
,pfull
)mrso
)mrso
is fine (max relative difference of 6.726486e-07)Summary of Changes
Core Changes
lib.py
(now deleted)lib.py
utilities to reduce code duplication and excessive nesting of function calls #190run_parallel()
andrun_serial()
with__main__.py
->E3SMtoCMIP._run_parallel()
andE3SMtoCMIP._run_serial()
filepaths
tovars_to_filepaths
for clarity (its a dict mapping var key to list of string paths)try
andexcept
statement for submittingpool
jobs to maintain compatibility with MPAS variable handlers, which use different handler method argumentshandle_variables()
,get_dimension_data()
andload_axis()
withVarHandler.cmorize()
handle_simple()
-- will be re-implemented from scratch in Fix--simple
mode not working #130handler.py
VarHandler.cmorize()
data
dictionary storingxr.DataArrays
withds
(xr.Dataset
object)pfull
,phalf
VarHandler._cmor_write()
areacella.py
,orog.py
, andsftlf.py
) #217cmor.write()
instead of looping over each time value index and CMORizing each slice -- this should improve performance and removes thetqdm
progress bar.handlers.yaml
phalf
andpfull
entriesphalf.py
andpfull.py
tohandlers.yaml
#115phalf.py
andpfull.py
clcalispo
entryclcalipso.py
withhandlers.yaml
#218clcalipso.py
_formulas.py
np.ndarray
toxr.DataArray
pfull
andphalf
convert_units()
function, which handles 1-to-1 unit conversions -- replacesdefault.default_handlers.write_data()
default.py
(now deleted)VarHandler.cmorize()
and_formulas.py.convert_units()
Clean Up Changes
cdms2
andcdutil
from dependencies indev.yml
andci.yml
7clisccp.py
clcalipso.py
withhandlers.yaml
#218Makefile
for easy access to commonly used commands (e.g., building and installing package)