Skip to content

Commit

Permalink
CDAT Migration Phase 2: Refactor core utilities and lat_lon set (#677)
Browse files Browse the repository at this point in the history
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
  • Loading branch information
tomvothecoder committed Dec 5, 2024
1 parent 3f5b036 commit 6681e89
Show file tree
Hide file tree
Showing 298 changed files with 100,024 additions and 13,006 deletions.
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[report]
exclude_also =
if TYPE_CHECKING:
2 changes: 1 addition & 1 deletion .github/workflows/build_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
branches: [main]

pull_request:
branches: [main]
branches: [main, cdat-migration-fy24]

workflow_dispatch:

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ ENV/

# NetCDF files needed
!e3sm_diags/driver/acme_ne30_ocean_land_mask.nc
!auxiliary_tools/cdat_regression_testing/759-slice-flag/debug/*.nc

# Folder for storing quality assurance files and notes
qa/
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,5 @@ repos:
hooks:
- id: mypy
args: [--config=pyproject.toml]
additional_dependencies: [dask, numpy>=1.23.0, types-PyYAML]
additional_dependencies:
[dask, numpy>=1.23.0, xarray>=2023.3.0, types-PyYAML]
2 changes: 1 addition & 1 deletion .vscode/e3sm_diags.code-workspace
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
Expand Down
File renamed without changes.
131 changes: 67 additions & 64 deletions auxiliary_tools/aerosol_budget.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# NOTE: This module uses the deprecated e3sm_diags.driver.utils.dataset.Dataset
# class, which was replaced by e3sm_diags.driver.utils.dataset_xr.Dataset.
import e3sm_diags
from e3sm_diags.driver import utils
import cdms2
Expand All @@ -12,11 +14,12 @@


def global_integral(var, area_m2):
""" Compute global integral of 2 dimentional properties"""
return numpy.sum(numpy.sum(abs(var)*area_m2,axis = 0), axis=0)
"""Compute global integral of 2 dimentional properties"""
return numpy.sum(numpy.sum(abs(var) * area_m2, axis=0), axis=0)


def calc_column_integral(data, aerosol, season):
""" Calculate column integrated mass """
"""Calculate column integrated mass"""

# take aerosol and change it to the appropriate string
# ncl -> SEASALT, dst -> DUST, rest1 -> REST1
Expand All @@ -32,129 +35,129 @@ def calc_column_integral(data, aerosol, season):
burden = data.get_climo_variable(f"ABURDEN{aerosol_name}", season)
except RuntimeError:
# if not, use the Mass_ terms and integrate over the column
mass = data.get_climo_variable(f'Mass_{aerosol}', season)
mass = data.get_climo_variable(f"Mass_{aerosol}", season)
hyai, hybi, ps = data.get_extra_variables_only(
f'Mass_{aerosol}', season, extra_vars=["hyai", "hybi", "PS"]
f"Mass_{aerosol}", season, extra_vars=["hyai", "hybi", "PS"]
)

p0 = 100000.0 # Pa
ps = ps # Pa
pressure_levs = cdutil.vertical.reconstructPressureFromHybrid(ps, hyai, hybi, p0)
ps = ps # Pa
pressure_levs = cdutil.vertical.reconstructPressureFromHybrid(
ps, hyai, hybi, p0
)

#(72,lat,lon)
delta_p = numpy.diff(pressure_levs,axis = 0)
mass_3d = mass*delta_p/9.8 #mass density * mass air kg/m2
burden = numpy.nansum(mass_3d,axis = 0) #kg/m2
# (72,lat,lon)
delta_p = numpy.diff(pressure_levs, axis=0)
mass_3d = mass * delta_p / 9.8 # mass density * mass air kg/m2
burden = numpy.nansum(mass_3d, axis=0) # kg/m2
return burden



def generate_metrics_dic(data, aerosol, season):
metrics_dict = {}
wetdep = data.get_climo_variable(f'{aerosol}_SFWET', season)
drydep = data.get_climo_variable(f'{aerosol}_DDF', season)
srfemis = data.get_climo_variable(f'SF{aerosol}', season)
area = data.get_extra_variables_only(
f'{aerosol}_DDF', season, extra_vars=["area"]
)
wetdep = data.get_climo_variable(f"{aerosol}_SFWET", season)
drydep = data.get_climo_variable(f"{aerosol}_DDF", season)
srfemis = data.get_climo_variable(f"SF{aerosol}", season)
area = data.get_extra_variables_only(f"{aerosol}_DDF", season, extra_vars=["area"])
area_m2 = area * REARTH**2

burden = calc_column_integral(data, aerosol, season)
burden_total= global_integral(burden, area_m2)*1e-9 # kg to Tg
print(f'{aerosol} Burden (Tg): ',f'{burden_total:.3f}')
sink = global_integral((drydep-wetdep),area_m2)*UNITS_CONV
drydep = global_integral(drydep,area_m2)*UNITS_CONV
wetdep = global_integral(wetdep,area_m2)*UNITS_CONV
srfemis = global_integral(srfemis,area_m2)*UNITS_CONV
print(f'{aerosol} Sink (Tg/year): ',f'{sink:.3f}')
print(f'{aerosol} Lifetime (days): ',f'{burden_total/sink*365:.3f}')
burden_total = global_integral(burden, area_m2) * 1e-9 # kg to Tg
print(f"{aerosol} Burden (Tg): ", f"{burden_total:.3f}")
sink = global_integral((drydep - wetdep), area_m2) * UNITS_CONV
drydep = global_integral(drydep, area_m2) * UNITS_CONV
wetdep = global_integral(wetdep, area_m2) * UNITS_CONV
srfemis = global_integral(srfemis, area_m2) * UNITS_CONV
print(f"{aerosol} Sink (Tg/year): ", f"{sink:.3f}")
print(f"{aerosol} Lifetime (days): ", f"{burden_total/sink*365:.3f}")
metrics_dict = {
"Surface Emission (Tg/yr)": f'{srfemis:.3f}',
"Sink (Tg/yr)": f'{sink:.3f}',
"Dry Deposition (Tg/yr)": f'{drydep:.3f}',
"Wet Deposition (Tg/yr)": f'{wetdep:.3f}',
"Burden (Tg)": f'{burden_total:.3f}',
"Lifetime (Days)": f'{burden_total/sink*365:.3f}',
"Surface Emission (Tg/yr)": f"{srfemis:.3f}",
"Sink (Tg/yr)": f"{sink:.3f}",
"Dry Deposition (Tg/yr)": f"{drydep:.3f}",
"Wet Deposition (Tg/yr)": f"{wetdep:.3f}",
"Burden (Tg)": f"{burden_total:.3f}",
"Lifetime (Days)": f"{burden_total/sink*365:.3f}",
}
return metrics_dict


param = CoreParameter()
param.test_name = 'v2.LR.historical_0101'
param.test_name = 'F2010.PD.NGD_v3atm.0096484.compy'
param.test_data_path = '/Users/zhang40/Documents/ACME_simulations/'
param.test_data_path = '/compyfs/mahf708/E3SMv3_dev/F2010.PD.NGD_v3atm.0096484.compy/post/atm/180x360_aave/clim/10yr'
param.test_name = "v2.LR.historical_0101"
param.test_name = "F2010.PD.NGD_v3atm.0096484.compy"
param.test_data_path = "/Users/zhang40/Documents/ACME_simulations/"
param.test_data_path = "/compyfs/mahf708/E3SMv3_dev/F2010.PD.NGD_v3atm.0096484.compy/post/atm/180x360_aave/clim/10yr"
test_data = utils.dataset.Dataset(param, test=True)

#rearth = 6.37122e6 #km
#UNITS_CONV = 86400.0*365.0*1e-9 # kg/s to Tg/yr
REARTH = 6.37122e6 #km
UNITS_CONV = 86400.0*365.0*1e-9 # kg/s to Tg/yr
# rearth = 6.37122e6 #km
# UNITS_CONV = 86400.0*365.0*1e-9 # kg/s to Tg/yr
REARTH = 6.37122e6 # km
UNITS_CONV = 86400.0 * 365.0 * 1e-9 # kg/s to Tg/yr
# TODO:
# Convert so4 unit to TgS
#mwso4 = 115.0
#mws = 32.066
#UNITS_CONV_S = UNITS_CONV/mwso4*mws # kg/s to TgS/yr
# mwso4 = 115.0
# mws = 32.066
# UNITS_CONV_S = UNITS_CONV/mwso4*mws # kg/s to TgS/yr


species = ["bc", "dst", "mom", "ncl","pom","so4","soa"]
SPECIES_NAMES = {"bc": "Black Carbon",
species = ["bc", "dst", "mom", "ncl", "pom", "so4", "soa"]
SPECIES_NAMES = {
"bc": "Black Carbon",
"dst": "Dust",
"mom": "Marine Organic Matter",
"ncl": "Sea Salt",
"pom": "Primary Organic Matter",
"so4": "Sulfate",
"soa": "Secondary Organic Aerosol"}
"soa": "Secondary Organic Aerosol",
}
MISSING_VALUE = 999.999
metrics_dict = {}
metrics_dict_ref = {}

seasons = ["ANN"]

ref_data_path = os.path.join(
e3sm_diags.INSTALL_PATH,
"control_runs",
"aerosol_global_metrics_benchmarks.json",
)
e3sm_diags.INSTALL_PATH,
"control_runs",
"aerosol_global_metrics_benchmarks.json",
)

with open(ref_data_path, 'r') as myfile:
ref_file=myfile.read()
with open(ref_data_path, "r") as myfile:
ref_file = myfile.read()

metrics_ref = json.loads(ref_file)

for season in seasons:
for aerosol in species:
print(f'Aerosol species: {aerosol}')
print(f"Aerosol species: {aerosol}")
metrics_dict[aerosol] = generate_metrics_dic(test_data, aerosol, season)
metrics_dict_ref[aerosol] = metrics_ref[aerosol]
#metrics_dict_ref[aerosol] = {
# metrics_dict_ref[aerosol] = {
# "Surface Emission (Tg/yr)": f'{MISSING_VALUE:.3f}',
# "Sink (Tg/yr)": f'{MISSING_VALUE:.3f}',
# "Dry Deposition (Tg/yr)": f'{MISSING_VALUE:.3f}',
# "Wet Deposition (Tg/yr)": f'{MISSING_VALUE:.3f}',
# "Burden (Tg)": f'{MISSING_VALUE:.3f}',
# "Lifetime (Days)": f'{MISSING_VALUE:.3f}',
# }
with open(f'aerosol_table_{season}.csv', "w") as table_csv:

with open(f"aerosol_table_{season}.csv", "w") as table_csv:
writer = csv.writer(
table_csv,
delimiter=",",
quotechar="'",
quoting=csv.QUOTE_MINIMAL,
lineterminator='\n',
lineterminator="\n",
)
#writer.writerow([" ", "test","ref",])
# writer.writerow([" ", "test","ref",])
for key, values in metrics_dict.items():
writer.writerow([SPECIES_NAMES[key]])
print('key',key, values)
print("key", key, values)
for value in values:
print(value)
line = []
line.append(value)
line.append(values[value])
line.append(metrics_dict_ref[key][value])
print(line, 'line')
print(line, "line")
writer.writerows([line])
writer.writerows([""])




Loading

0 comments on commit 6681e89

Please sign in to comment.