-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CMIP6] multiple statistics from one download #24
Comments
Hi @tdcwilliams, None of the diagnostics/functions that you are looking for have been implemented yet. It is not possible to pass multiple diagnostics functions at the moment, but the downloading step is cached separately under the hood. If you re-run out = []
for transform_func in transform_funcs:
out.append(download.download_and_transfrom(..., transform_func=transform_func)) If it's of use, we just added a much more advance notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cmip6_regionalised.ipynb |
Hi @malmans2, |
Hi Mattia @malmans2, I've made a bit of progress with this However I am not so used to Regards, |
Hi Tim, ds = sie.to_dataset(name="extent")
return ds.expand_dims(model=[model]) If you compute them in the same function (and therefore you cache a single dataset with both variables), you can do: ds = xr.merge([sie.rename("extent"), sia.rename("area")])
return ds.expand_dims(model=[model]) That way you don't have to drop any variables. Hope it helps! |
Thanks @malmans2 - it did help. I am still unclear about
|
Yes, I.e., send me a notebook or a python script that does this:
|
Hi @malmans2 - I have just got something downloaded and am playing around to get to know xarray a bit. |
Hi @malmans2, I have the downloading and transforming working, to give some time series of sea ice extent and area. Also the calculation of the grid areas is maybe a bit complicated and doesn't work for either ERA5 or one CMIP6 model (they don't provide the corners of the grid cells), so maybe it would just be simpler to just regrid onto a 100km equal-area grid or something. Is regridding very slow? Ciao and happy easter, |
Ciao @tdcwilliams ! I should be able to take a look this week.
We actually already have a function that is optimised for interpolations. It's called Sorry, it was originally meant to be a private method (and I might move it under utils in the future), so I forgot to add docstrings. But it's quite simple and we use https://xesmf.readthedocs.io/en/latest/ under the hood. Here is the docstring:
For example, it's used in a couple of notebook in Interpolating shouldn't be too slow at all, and if you use our regrid function it's optimised (we compute the weights only once, and we cache them on disk through a netcdf file. That way, anytime you need to interpolate data from/to the same grids the weights are re-used). So it's very much up to you, if you think it's scientifically OK interpolating, you can probably do that. |
Thanks @malmans2, I might use that regrid method instead of trying to code up areas for yet another type of grid. Great to hear it is optimised. The model SICs are pretty smooth so it should probably be fine to interpolate them. |
Hi @tdcwilliams, I'm looking at you notebook, sorry about the delay. |
Hi @malmans2 - I hadn't noticed we could output the grid-cell areas - that is certainly a much easier way! |
OK! I'm making a notebook template for your use case. Looks like we still need to estimate the grid cell areas for ERA5, but I've asked ECMWF if that variable is available somewhere. |
Hi @tdcwilliams, I've added a template for your use case. You can find it here. A couple of comments:
|
Hi @malmans2 - thanks for the help with this. I had a closer look at the Relating to the chunking, I have been trying to process more data but am getting many crashes
Ciao, Tim |
Hi Tim,
I'll implement this in the template. I just need to know which grid to use for the interpolation. Maybe ERA5? I think that's what CMCC does in a template for WP4.
Not sure, unfortunately these issues are very hard to debug as we don't maintain the VM or
I'll look into this. But I don't understand why something changed with
It's not really processed on the fly.
If you want to download all data first, then transform, you can just run for transform_func in (None, my_transform_func):
ds = download.download_and_transform(collection_id, request, transform_func=transform_func, **kwargs) If it's not clear, with
I would only use |
Hi @tdcwilliams, I've explored a bit the issue you mentioned in I've added this code in the template: # Remove extra-points
isel_dict = {}
for dim, size in areacello.sizes.items():
match size - siconc.sizes[dim]:
case 1:
isel_dict[dim] = slice(None, -1)
case 2:
isel_dict[dim] = slice(1, -1)
if isel_dict:
areacello = areacello.isel(**isel_dict).drop(list(isel_dict)) |
Update, I tried all models with sea-ice variables (listed in you orignal notebook).
|
Hi @tdcwilliams, I made some progress.
Let's catch up next week. You now have a few options:
Have a good weekend! |
Ciao @malmans2 - thanks a lot - that looks great! I think I am leaning towards (1) the regridding approach (simplicity of being able to treat all models the same and not needing many different cases). A convenient target grid would be the ones (Arctic or Antarctic) for https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-ice-concentration?tab=overview (25-km equal-area grids, so each grid cell is just 625km^2). This would let also let us reuse the approach to add comparisons to the satellite observations later on. Ciao, Tim |
Hi Tim, |
Thanks Mattia. Probably |
Hi @tdcwilliams, The template notebook is ready: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/cmip6_sea_ice_diagnostics.ipynb I think I had the units wrong in my previous figures, should be OK now. |
Hi there, Note that the |
Thanks @malmans2 - I'll try it out soon. Will try changing the method to |
Hi @tdcwilliams, What's the status of this issue? Are you still working on this and/or are there any standing issues? |
Hi Tim, I'm scraping the CDS forms because most of the experiments have some inconsistency. |
Hi @tdcwilliams, OK. The extension is good, although we might be able to get it done before the deadline.
Are these the experiments that you need to analysise? |
Hi @malmans2 - that's right - those are the experiments with a significant number of models having sea ice concentration |
Hi @malmans2 - the new deadline for the task is 30 November |
OK. BTW, it's now caching the last experiment. |
great! |
Hi @tdcwilliams, The notebook is ready. I resampled the timeseries (yearly means), otherwise the seasonal variability makes the plot too messy. |
Ooops, I forgot about this. Implementing it now. |
Done, same urls:
|
Thanks a lot @malmans2, I'll play around a bit with the plotting. |
PS ERA5 does look weird in Antarctica. It looks a bit like there are missing years in the time series that just got joined up with straight lines. |
I'll run some check! (probably tomorrow) |
I invalidated the cache and recomputed the diagnostics for ERA5, but Antartica still looks weird. Do you think that could be the cause of the issues? |
Hi @malmans2, I have looked at it more closely myself and there is data in those flat periods, but maybe they have done some artificial constraining of the sea ice (eg with a climatology) when there was no data to assimilate (pre-1979) which could be why there is so little variability at times. I can't find anything about it in the documentation though. Perhaps we just stick to after 1980 in our plots. Here's my latest notebook. I kind of prefer a couple of extra functions I've also added extent and area for satellites. |
Hi @tdcwilliams,
OK
OK, I'll implement the same in the template
Fine by me. But to do so, we need to invalidate the cache and re-run the transform functions. If you are OK with it, I'll go ahead (but please don't run this notebook until everything is cached again) BTW, are you sure that all models/satellites have the proper CF attribute
OK, I'll add satellite data in the template |
Hi @malmans2, |
btw when I said staying later than 1980, I only meant for ERA5 not CMIP6 |
Hi @malmans2, is it OK for me to run this notebook again? |
Hi @tdcwilliams, Not yet, I'm caching everything right now. Hopefully it will be ready in the afternoon. |
Thanks @malmans2. I think the code is very nice now. The satellite code is certainly much simpler with yearly chunking - is that working OK? Also, how does filling missing months with 0 avoid the max/min problem for the observations? |
It works OK, besides for EUMETSAT April/May 1986 (no-data which results in 0s). This is why I'm getting rid of those months:
That's a separate issue, which is addressed in this function: def full_year_only_resample(ds, reduction):
mask = ds["time"].resample(time="Y").count() == 12
return getattr(ds.resample(time="Y"), reduction)().where(mask, drop=True) |
great, sounds good |
Hi @tdcwilliams, You can now test the latest template.
Let me know! |
Hi @tdcwilliams, I'm planning to work on #102 now. |
Hi @malmans2, it seems good. |
Hi Mattia (@malmans2),
I would like to get daily time series of different quantities calculated from the sea ice concentration using one download if possible.
The statistics I want to get for each day are for each model, and for Arctic and Antarctic:
sea_ice_extent = area_of_grid_cells[region_mask] * sea_ice_mask[region_mask]
withsea_ice_mask = 1 if sea_ice_concentration > 0.3 else 0
and egregion_mask_arctic = lat > 40
orregion_mask_antarctic = lat < -40
sea _ice_area = area_of_grid_cells[region_mask] * sea_ice_concentration[region_mask]
Are there already or could there be some functions implemented that:
Also could the
transform_func
argument ofdownload_and_transform
also take a list or even a dict of different transform functions (egtransform_func=[get_arctic_extent, get_arctic_area, get_antarctic_extent, get_antarctic_area]
) and output a list/dict/pandas.DataFrame so we don't have to download everytime we want to make a calculate a different statistic?The text was updated successfully, but these errors were encountered: