You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's assume that the benefit of working from a re-loaded zarr collection written once from the 366 netcdf files is much more performant than an object loaded via xr.open_mfdataset from the 366 netcdf files. (we'll test this later once we have the zarr collections to compare)
Given that the native netcdf storage format is "monthly" files with 28,29,30, or 31 Time steps and that both 29 and 31 are prime numbers {'Time':1} is the only time chunking available to us for the first write to zarr
xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}} yields 109.86 MiB chunks - 200,484 chunks total and number of tasks : 401,334
The number of chunks, number of tasks, and size of chunks for this open with kwargs here is similar to the chunks='auto' job.
Next step: write "analysis ready data" (ARD) zarr collection from the 366 netcdf files using xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}
loading via intake from 366 netcdf with chunks: {'Time':-1} directly into .groupby('Time.month').mean() is better approach for all but median & quantile. See: #17 (comment)
Let's assume that the benefit of working from a re-loaded
zarr
collection written once from the 366netcdf
files is much more performant than an object loaded viaxr.open_mfdataset
from the 366netcdf
files. (we'll test this later once we have thezarr
collections to compare)Given that the native
netcdf
storage format is "monthly" files with 28,29,30, or 31Time
steps and that both 29 and 31 are prime numbers {'Time':1} is the only time chunking available to us for the first write tozarr
xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}
yields 109.86 MiB chunks - 200,484 chunks total and number of tasks : 401,334The number of chunks, number of tasks, and size of chunks for this open with
kwargs
here is similar to the chunks='auto' job.Next step: write "analysis ready data" (ARD)
zarr
collection from the 366netcdf
files usingxarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}
Originally posted by @Thomas-Moore-Creative in #17 (comment)
The text was updated successfully, but these errors were encountered: