-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud regime analysis #251
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies @Isaaciwd and @brianpm for taking almost six months longer then I originally promised, but I finally got around to reviewing this PR. It's definitely some great work!
I have a fairly large amount of change requests, but hopefully most (all?) of them should be easy to resolve. Of course please let me know if you have any questions or concerns with any of my requests or suggestions, and thanks again for the effort!
|
||
MODIS_emd_centers: | ||
category: "Clouds" | ||
obs_file: 'MODIS_emd-means_n_init5_centers_1.np' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing an extra y
at the end:
obs_file: 'MODIS_emd-means_n_init5_centers_1.np' | |
obs_file: 'MODIS_emd-means_n_init5_centers_1.npy' |
import xarray as xr | ||
import matplotlib as mpl | ||
from mpl_toolkits.axes_grid1 import make_axes_locatable | ||
from numba import njit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is numba
required for this script to run? If so then we should probably add it to the ADF-provided env/conda_environment.yaml
file so that this script can be used with that environment as well.
import os | ||
|
||
|
||
#global num_iter, n_samples, data, ds, ht_var_name, tau_var_name, k, height_or_pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line still needed? If not then I would go ahead and remove it.
def cloud_regime_analysis(adf, wasserstein_or_euclidean = "euclidean", data_product='all', premade_cloud_regimes=None, lat_range=None, lon_range=None, only_ocean_or_land=False): | ||
""" | ||
This script/function is designed to generate 2-D lat/lon maps of Cloud Regimes (CRs), as well as plots of the CR | ||
centers themselves. It can fit data into CRs using either Wassertstein (AKA Earth Movers Distance) or the more conventional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here:
centers themselves. It can fit data into CRs using either Wassertstein (AKA Earth Movers Distance) or the more conventional | |
centers themselves. It can fit data into CRs using either Wasserstein (AKA Earth Movers Distance) or the more conventional |
import glob | ||
from math import ceil | ||
import time | ||
import dask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask
is not currently included in the ADF-provided env/conda_environment.yaml
file. I would recommend adding it in this PR unless the use of dask is optional for this script.
# Opening an initial dataset | ||
init_ds_b = xr.open_dataset(files[0]) | ||
|
||
print(f' -Starting {data} CAM baseline data') #testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove #testing
comment?
# Variable that gets set to true if var is missing in the data file, and is used to skip processing that dataset | ||
missing_var = False | ||
|
||
# Trying to open time series files from cam)ts_loc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here:
# Trying to open time series files from cam)ts_loc | |
# Trying to open time series files from cam_ts_loc |
# Masking out the land or water | ||
if only_ocean_or_land == 'L': ds_b = ds_b.where(land == 1) | ||
elif only_ocean_or_land == 'O': ds_b = ds_b.where(land == 0) | ||
else: raise Exception('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to kill the ADF here:
else: raise Exception('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water') | |
else: | |
print('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water') | |
return |
weights_b=weights_b[valid_indicies_b] | ||
|
||
if np.min(mat_b < 0): | ||
raise Exception (f'Found negative value in ds_b.{var_name}, if this is a fill value for missing data, convert to nans and try again') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to kill the ADF here:
raise Exception (f'Found negative value in ds_b.{var_name}, if this is a fill value for missing data, convert to nans and try again') | |
print(f'Found negative value in ds_b.{var_name}, if this is a fill value for missing data, convert to nans and try again') | |
return |
@@ -0,0 +1,1369 @@ | |||
#%% | |||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would delete this line, as I suspect this print
statement will be called at a weird time in the ADF run, if it is called at all.
Add a cloud regime analysis script to the ADF.