Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cmor.c #1

Closed
wants to merge 1 commit into from
Closed

Update cmor.c #1

wants to merge 1 commit into from

Conversation

cofinoa
Copy link
Owner

@cofinoa cofinoa commented Apr 9, 2024

Use netcCDF4 DEFAULT_CHUNK_SIZES, for chunked vars and coordinates/axis.

This relates to issue PCMDI#601 where is explained that chunk sizes of 1, for coordinates/axis, like time has a huge bad performance impact on reading those netCDF variables.

The netcdf-c library defines default CHUNK sizes for netCDF4/HDF5 files when chunkingsizes are NULL.

For current netcdf-c (i.e. version 4.9.2)

  • nc_def_var_chunking:

    [...] Chunk sizes may be specified with the chunksizes parameter or default sizes will be used if that parameter is NULL. [...]

  • See Default Chunking Scheme from netCDF User Guide (NUG):
    • [...] variables that only have a single unlimited dimension [...] the [default] chunk sizes for such variables are limited to 4KiB

    • [...] Currently the netCDF default chunk size is 4MiB, which is reasonable for filesystems on high-performance computing platforms [...]

    • [...] The current default chunking strategy of the netCDF library is to balance access time along any of a variable's dimensions, by using chunk shapes similar to the shape of the entire variable but small enough that the resulting chunk size is less than or equal to the default chunk size. This differs from an earlier default chunking strategy that always used one for the length of a chunk along any unlimited dimension, and otherwise divided up the number of chunks along fixed dimensions to keep chunk sizes less than or equal to the default chunk size. [...]

  • To change the default chunk cache size, use the nc_set_chunk_cache() function before opening the file, for all variables, or per variable use nc_set_var_chunk_cache().
  • Related HDF5 function: H5Pset_cache
  • This PR not only propose DEFAULT chunking for time coordinate/axis but also for data variable itself with unlimited dimensions.

Use netcCDF4 DEFAULT_CHUNK_SIZES, for chunked vars and coordinates/axis.
@cofinoa cofinoa closed this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant