Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write "analysis ready data" (ARD) zarr collection from the 366 netcdf files using xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}} #18

Closed
Thomas-Moore-Creative opened this issue Apr 30, 2024 · 1 comment

Comments

@Thomas-Moore-Creative
Copy link
Owner

Thomas-Moore-Creative commented Apr 30, 2024

Let's assume that the benefit of working from a re-loaded zarr collection written once from the 366 netcdf files is much more performant than an object loaded via xr.open_mfdataset from the 366 netcdf files. (we'll test this later once we have the zarr collections to compare)

Given that the native netcdf storage format is "monthly" files with 28,29,30, or 31 Time steps and that both 29 and 31 are prime numbers {'Time':1} is the only time chunking available to us for the first write to zarr

xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}} yields 109.86 MiB chunks - 200,484 chunks total and number of tasks : 401,334

The number of chunks, number of tasks, and size of chunks for this open with kwargs here is similar to the chunks='auto' job.

Next step: write "analysis ready data" (ARD) zarr collection from the 366 netcdf files using xarray_open_kwargs = {"chunks": {"Time": 1, "st_ocean": 20, "xt_ocean": 1200, "yt_ocean": 1200}}

Originally posted by @Thomas-Moore-Creative in #17 (comment)

@Thomas-Moore-Creative
Copy link
Owner Author

loading via intake from 366 netcdf with chunks: {'Time':-1} directly into .groupby('Time.month').mean() is better approach for all but median & quantile. See: #17 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant