Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor 654 zonal mean xy #752

Merged
merged 12 commits into from
Feb 15, 2024

Conversation

chengzhuzhang
Copy link
Contributor

@chengzhuzhang chengzhuzhang commented Oct 27, 2023

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings: there is warning from xgcm
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder changed the base branch from main to cdat-migration-fy24 October 30, 2023 21:46
@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Nov 9, 2023

The components to refactor include the driver and the plotter.
File 1, zonal_mean_xy_driver.py

  • refactor regrid_to_lower_res_1d function that regrids 1-D transient variable toward lower resolution of two variables . Question can I reuse or extend align_grids_to_lower_res.

File 2, zonal_mean_xy_plot.py

  • use xarray object instead of cdms transient variable. No major function to refactor.

@tomvothecoder tomvothecoder force-pushed the cdat-migration-fy24 branch 3 times, most recently from 0dfc01d to d1aec82 Compare November 29, 2023 00:58
@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Dec 2, 2023

With the new commit 341ac3c, there are two problems:

  1. an error:
 2023-12-01 16:17:54,212 [ERROR]: core_parameter.py(_run_diag:326) >> Error in e3sm_diags.driver.zonal_mean_xy_driver
Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 323, in _run_diag
    single_result = module.run_diag(self)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/driver/zonal_mean_xy_driver.py", line 88, in run_diag
    _run_diags_3d(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/driver/zonal_mean_xy_driver.py", line 200, in _run_diags_3d
    ds_test_rg = regrid_z_axis_to_plevs(ds_test, var_key, parameter.plevs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/driver/utils/regrid.py", line 384, in regrid_z_axis_to_plevs
    ds_plevs = _hybrid_to_plevs(ds, var_key, plevs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/driver/utils/regrid.py", line 443, in _hybrid_to_plevs
    result = ds.regridder.vertical(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/regridder/accessor.py", line 392, in vertical
    input_grid = _get_input_grid(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/regridder/accessor.py", line 445, in _get_input_grid
    grid = input_grid.regridder.grid
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/regridder/accessor.py", line 91, in grid
    data, bnds = self._get_axis_data(axis)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/regridder/accessor.py", line 111, in _get_axis_data
    _validate_grid_has_single_axis_dim(name, coord_var)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/regridder/grid.py", line 744, in _validate_grid_has_single_axis_dim
    raise ValueError(
ValueError: Multiple 'X' axis dims were found in this dataset, ['lon', 'slon']. Please drop the unused dimension(s) before performing grid operations.
  1. pre-commiting hooks isort and black conflicts. And updating .pre-committing yaml didn't help.

@tomvothecoder tomvothecoder force-pushed the refactor/654-zonal_mean_xy branch 2 times, most recently from 2808fb1 to 25747e6 Compare December 4, 2023 18:42
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chengzhuzhang, I addressed your recent issues in this comment.

I also rebased this branch on the latest version of cdat-migration-fy24, which I squashed down into less commits. If you have any un-committed changes, can you stash then, check out the latest version of this branch, then pop them back on top? Thanks

e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
@tomvothecoder tomvothecoder force-pushed the refactor/654-zonal_mean_xy branch 2 times, most recently from e101244 to 6bbc864 Compare December 4, 2023 18:56
@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder thank you for fixing this branch!

For the excessive slat, slon, it turned out to be they are included in the model climatology files, the relevant texts in the header as shown below:

        double slat(slat) ;
                slat:long_name = "Latitude for staggered FV grid" ;
                slat:units = "degrees_north" ;
        double w_stag(slat) ;
                w_stag:long_name = "Latitude weights for staggered FV grid" ;
        double slon(slon) ;
                slon:long_name = "Longitude for staggered FV grid" ;
                slon:units = "degrees_east" ;

I think I will try dropping the slat and slon dims, as a workaround for now.

@chengzhuzhang
Copy link
Contributor Author

Hey @chengzhuzhang, I addressed your recent issues in this comment.

I also rebased this branch on the latest version of cdat-migration-fy24, which I squashed down into less commits. If you have any un-committed changes, can you stash then, check out the latest version of this branch, then pop them back on top? Thanks

yep, I have checked out the latest version. Thanks for the heads-up!

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some updates to your test script to pass diags.cfg using sys.argv.extend. This allows you to perform Python debugging with parameters defined in diags.cfg.

examples/test_refactor/ex5.py Outdated Show resolved Hide resolved
examples/test_refactor/ex5.py Outdated Show resolved Hide resolved
@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder I made plotting work, but will try consolidate the codes a little better.
But it seems my pre-commit isort still has errorsthat I can't sort out..

@chengzhuzhang
Copy link
Contributor Author

Other than mypy complaints. There are still several problems causing some variables fails to generate plots. The log for a full run.
The result can be found here

@forsyth2 forsyth2 mentioned this pull request Dec 18, 2023
9 tasks
@tomvothecoder tomvothecoder force-pushed the refactor/654-zonal_mean_xy branch 2 times, most recently from 08b3ab0 to 553fea7 Compare December 18, 2023 23:18
@tomvothecoder tomvothecoder force-pushed the refactor/654-zonal_mean_xy branch from 553fea7 to 09fbe13 Compare January 3, 2024 22:24
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jan 5, 2024

Other than mypy complaints. There are still several problems causing some variables fails to generate plots. The log for a full run. The result can be found here

I listed the errors from the log below:

Issue 1

  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/driver/zonal_mean_xy_driver.py", line 322, in regrid_to_lower_res_1d
    if len(ds_test_1d.lat) > len(ds_ref_1d.lat):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/common.py", line 278, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'lat'
  • The name of the latitude coordinates is not lat on the DataArray. You should use xc.get_dim_keys(DataArray, axis="Y") to get the name dynamically.

Issue 2

packages/e3sm_diags/plot/zonal_mean_xy_plot.py:115: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
  fig = plt.figure(figsize=parameter.figsize, dpi=parameter.dpi)
  • The warning message is self-explanatory.

Issue 3

  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/plot/zonal_mean_xy_plot.py", line 130, in plot
    ax1.set_ylabel(da_test.long_name + " (" + da_test.units + ")")
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/common.py", line 278, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'long_name'
  • The long_name attribute looks like it is dropped at some point. Maybe when calculating zonal mean?

Issue 4

Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/coding/times.py", line 213, in _decode_cf_datetime_dtype
    result = decode_cf_datetime(example_value, units, calendar, use_cftime)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/coding/times.py", line 321, in decode_cf_datetime
    dates = _decode_datetime_with_cftime(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/coding/times.py", line 237, in _decode_datetime_with_cftime
    cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True)
  File "src/cftime/_cftime.pyx", line 580, in cftime._cftime.num2date
  File "src/cftime/_cftime.pyx", line 98, in cftime._cftime._dateparse
ValueError: 'months since' units only allowed for '360_day' calendar
  • Self-explanatory. The cftime library can't decode "months since" units with other calendar types. xc.open_dataset()/xc.open_mf_dataset() can decode non-CF time units. If one of these functions is used, maybe remove use_cftime=True to avoid using Xarray's cftime decoder.

Issue 5

Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/conventions.py", line 432, in decode_cf_variables
    new_vars[k] = decode_cf_variable(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/conventions.py", line 283, in decode_cf_variable
    var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/coding/times.py", line 832, in decode
    dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/coding/times.py", line 223, in _decode_cf_datetime_dtype
    raise ValueError(msg)
ValueError: unable to decode time units 'months since 1983-06' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.
  • Self-explanatory. The cftime library can't decode months since" units with other calendar types. xc.open_dataset()/xc.open_mf_dataset()can decode non-CF time units. If one of these functions is used, maybe removeuse_cftime=Trueto avoid using Xarray'scftime` decoder.

Issue 6

 File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/driver/utils/dataset_xr.py", line 254, in _get_global_attr_from_climo_dataset
    ds = xr.open_dataset(filepath)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/backends/api.py", line 572, in open_dataset
    backend_ds = backend.open_dataset(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 607, in open_dataset
    ds = store_entrypoint.open_dataset(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/backends/store.py", line 58, in open_dataset
    ds = Dataset(vars, attrs=attrs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataset.py", line 697, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataset.py", line 426, in merge_data_and_coords
    return merge_core(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/merge.py", line 724, in merge_core
    dims = calculate_dimensions(variables)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/variable.py", line 3007, in calculate_dimensions
    raise ValueError(
ValueError: dimension 'time' already exists as a scalar variable
  • Not 100% what is happening here when you open up the dataset.

Issue 7

  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/derivations/utils.py", line 175, in cosp_bin_sum
    prs: FileAxis = cld.getAxis(0)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/common.py", line 278, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'getAxis'

Issue 8

  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/e3sm_diags/derivations/formulas.py", line 81, in pminuse_convert_units
    var.attrs["units"] == "kg/m2/s"
KeyError: 'units'
  • The units attribute looks like it is dropped at some point. Maybe when calculating zonal mean?

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chengzhuzhang, I did some code clean up code and fixed the mypy warnings (which were all definitely useful).

e3sm_diags/driver/utils/io.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/zonal_mean_xy_driver.py Outdated Show resolved Hide resolved
e3sm_diags/plot/zonal_mean_xy_plot.py Outdated Show resolved Hide resolved
e3sm_diags/plot/zonal_mean_xy_plot.py Outdated Show resolved Hide resolved
- Needed to sort the `prs_range` and `tau_range` in ascending order before creating `cond` to pass to `.where()`, otherwise if out of order then there is a possibility that the condition will fail entirely
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resolved issue 1 your comment here with two separate fixes (mentioned in my review comments)

Here's my debugging notebook now showing Number of files missing: 0

I will merge this PR once the build passes. We can investigate issue 2 in a separate PR.

Comment on lines +642 to +668
matching_target_var_map = self._get_matching_climo_src_vars(ds, target_var_map)

# NOTE: Since there's only one set of vars, we get the first and only set
# of vars from the derived variable dictionary.
# 1. Get the derivation function.
derivation_func = list(matching_target_var_map.values())[0]
if matching_target_var_map is not None:
# NOTE: Since there's only one set of vars, we get the first and only set
# of vars from the derived variable dictionary.
# 1. Get the derivation function.
derivation_func = list(matching_target_var_map.values())[0]

# 2. Get the derivation function arguments using source variable keys.
# Example: [xr.DataArray(name="PRECC",...), xr.DataArray(name="PRECL",...)]
src_var_keys = list(matching_target_var_map.keys())[0]
# 2. Get the derivation function arguments using source variable keys.
# Example: [xr.DataArray(name="PRECC",...), xr.DataArray(name="PRECL",...)]
src_var_keys = list(matching_target_var_map.keys())[0]

# 3. Use the derivation function to derive the variable.
ds_final = self._get_dataset_with_derivation_func(
ds, derivation_func, src_var_keys, target_var
)
# 3. Use the derivation function to derive the variable.
ds_derived = self._get_dataset_with_derivation_func(
ds, derivation_func, src_var_keys, target_var
)

return ds_final
return ds_derived

# None of the entries in the derived variables dictionary worked,
# so try to get the variable directly from he dataset.
if target_var in ds.data_vars.keys():
return ds

raise IOError(
f"The dataset file has no matching source variables for {target_var}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: Issue 1 in #752 (comment)

This fixes the missing outputs for the following variables: FLUT, FLUTC, SOLIN, FLDS, FLNS, FLNSC, FSDS, FLDSC, FSNS, FSNSC, ERA5-TMQ, MERRA2-TMQ

The issue was that I did not include a conditional to try to return the variable directly from the climatology dataset if it cannot be derived (lines 661-664)

Comment on lines 119 to 130
# 5. Get the axes mask conditional and subset the variable on it.
cond = (
(prs >= prs_range[0])
& (prs <= prs_range[-1])
& (tau >= tau_range[0])
& (tau <= tau_range[-1])
)

var_sub = var.where(cond, drop=True)

# 5. Sum on axis=0 and axis=1 (tau and prs)
# 7. Sum on axis=0 and axis=1 (tau and prs)
var_sum = var_sub.sum(dim=[prs.name, tau.name])
Copy link
Collaborator

@tomvothecoder tomvothecoder Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: Issue 1 in #752 (comment)

The changes in cosp_bin_sum() now fix the missing outputs for the remaining variables: ISCCPCOSP-CLDTOT_TAU1.3_9.4_ISCCP, ISCCPCOSP-CLDTOT_TAU1.3_ISCCP, MODISCOSP-CLDHGH/TOT_TAU1.3_9.4_MODIS, MODISCOSP-CLDHGH/TOT_TAU1.3_MODIS

Root Cause

  1. The prs_range can be in descending order (e.g., (90000.0, 9000.0))
  2. This caused the condition (cond) to not match any of the data values when subsetting with .where()
    • The condition being used is >=90000.0 and <=9000.0 which is incorrect
    • The correct condition should to be the other way around, >= 9000 and <= 90000
  3. Since there are no matching values, the below numpy error is thrown with drop=True (drop all nan values that don't match):
    • ValueError: Cannot apply_along_axis when any iteration dimensions are 0

Solution

Sort prs_range and tau_range before subsetting

Side-note

The CDMS2 version of sub-setting a variable is pretty confusing. Notice below how prs_low and prs_high are used to subset, but prs_low is actually the larger value with my example, (90000.0, 9000.0)

I think CDMS2 swaps both values around and subsets to include all values within the range. It uses the correct sub-setting condition, >= 9000 and <= 90000.

if cld.id == "FISCCP1_COSP": # ISCCP model
cld_bin = cld(cosp_prs=(prs_low, prs_high), cosp_tau=(tau_low, tau_high))
simulator = "ISCCP"

e3sm_diags/metrics/metrics.py Outdated Show resolved Hide resolved
@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Feb 15, 2024

@tomvothecoder Thank you for the perseverance to fix the derived variable logic and those COSP variable calculation!! The PR is in a much better shape after the fix!

@tomvothecoder tomvothecoder merged commit fd38d72 into cdat-migration-fy24 Feb 15, 2024
4 checks passed
@tomvothecoder tomvothecoder deleted the refactor/654-zonal_mean_xy branch February 15, 2024 23:24
tomvothecoder added a commit that referenced this pull request May 1, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Jul 15, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Jul 15, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Aug 21, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Aug 21, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Oct 1, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Oct 25, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Oct 29, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Oct 29, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Dec 4, 2024
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
tomvothecoder added a commit that referenced this pull request Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
tomvothecoder added a commit that referenced this pull request Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
tomvothecoder added a commit that referenced this pull request Jan 15, 2025
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdat-migration-fy24 CDAT Migration FY24 Task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDAT Migration Phase 2: Refactor zonal_mean_xy set
3 participants