write "analysis ready data" (ARD) `zarr` collection from the 366 `netcdf` files using `xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}` #18

Thomas-Moore-Creative · 2024-04-30T01:31:28Z

Let's assume that the benefit of working from a re-loaded zarr collection written once from the 366 netcdf files is much more performant than an object loaded via xr.open_mfdataset from the 366 netcdf files. (we'll test this later once we have the zarr collections to compare)

Given that the native netcdf storage format is "monthly" files with 28,29,30, or 31 Time steps and that both 29 and 31 are prime numbers {'Time':1} is the only time chunking available to us for the first write to zarr

xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}} yields 109.86 MiB chunks - 200,484 chunks total and number of tasks : 401,334

The number of chunks, number of tasks, and size of chunks for this open with kwargs here is similar to the chunks='auto' job.

Next step: write "analysis ready data" (ARD) zarr collection from the 366 netcdf files using xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}

Originally posted by @Thomas-Moore-Creative in #17 (comment)

The text was updated successfully, but these errors were encountered:

Thomas-Moore-Creative · 2024-04-30T09:21:31Z

loading via intake from 366 netcdf with chunks: {'Time':-1} directly into .groupby('Time.month').mean() is better approach for all but median & quantile. See: #17 (comment)

Thomas-Moore-Creative added a commit that referenced this issue Apr 30, 2024

check the chunking wrt netcdf > xr > zarr #18

435f324

Thomas-Moore-Creative closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write "analysis ready data" (ARD) `zarr` collection from the 366 `netcdf` files using `xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}` #18

write "analysis ready data" (ARD) `zarr` collection from the 366 `netcdf` files using `xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}` #18

Thomas-Moore-Creative commented Apr 30, 2024 •

edited

Loading

Thomas-Moore-Creative commented Apr 30, 2024

write "analysis ready data" (ARD) zarr collection from the 366 netcdf files using xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}} #18

write "analysis ready data" (ARD) zarr collection from the 366 netcdf files using xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}} #18

Comments

Thomas-Moore-Creative commented Apr 30, 2024 • edited Loading

Thomas-Moore-Creative commented Apr 30, 2024

write "analysis ready data" (ARD) `zarr` collection from the 366 `netcdf` files using `xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}` #18

write "analysis ready data" (ARD) `zarr` collection from the 366 `netcdf` files using `xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}` #18

Thomas-Moore-Creative commented Apr 30, 2024 •

edited

Loading