Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable compact storage for netcdf-4 vars #1570

Merged
merged 14 commits into from
Dec 19, 2019
30 changes: 25 additions & 5 deletions libdispatch/dvar.c
Original file line number Diff line number Diff line change
Expand Up @@ -452,9 +452,25 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32)
/**
Define chunking parameters for a variable

The function nc_def_var_chunking sets the chunking parameters for a
variable in a netCDF-4 file. It can set the chunk sizes to get chunked
storage, or it can set the contiguous flag to get contiguous storage.
The function nc_def_var_chunking sets the storage and, optionally,
the chunking parameters for a variable in a netCDF-4 file.

The storage may be set to NC_CONTIGUOUS, NC_COMPACT, or NC_CHUNKED.

Contiguous storage means the variable is stored as one block of
data in the file.

Compact storage means the variable is stored in the header record
of the file. This can have large performance benefits on HPC system
running many processors. Compact storage is only available for
variables whose data are 64 KB or less. Attempting to turn on
compact storage for a variable that is too large will result in the
::NC_EVARSIZE error.

Chunked storage means the data are stored as chunks, of
user-configurable size. Chunked storage is required for variable
with one or more unlimted dimensions, or variable which use
compression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to also document that each mpi rank must output the same data to the variable if compact storage is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. I was confused. The datasets that I can declare as compact in Exodus are all "metadata" types that are the same on all ranks, but that isn't a requirement on the HDF5 side; confused myself (and others).


The total size of a chunk must be less than 4 GiB. That is, the
product of all chunksizes and the size of the data (or the size of
Expand All @@ -467,8 +483,8 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32)
Note that this does not work for scalar variables. Only non-scalar
variables can have chunking.

@param ncid NetCDF ID, from a previous call to nc_open or
nc_create.
@param ncid NetCDF ID, from a previous call to nc_open() or
nc_create().

@param varid Variable ID.

Expand Down Expand Up @@ -501,6 +517,10 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32)
@return ::NC_EBADCHUNK Returns if the chunk size specified for a
variable is larger than the length of the dimensions associated with
variable.
@return ::NC_EVARSIZE Compact storage attempted for variable bigger
than 64 KB.
@return ::NC_EINVAL Attempt to set contiguous or compact storage
for var with one or more unlimited dimensions.

@section nc_def_var_chunking_example Example

Expand Down
8 changes: 4 additions & 4 deletions libhdf5/hdf5var.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
* order. */
#define NC_TEMP_NAME "_netcdf4_temporary_variable_name_for_rename"

/** Number of bytes in 64 MB. */
#define SIXTY_FOUR_MB (67108864)
/** Number of bytes in 64 KB. */
#define SIXTY_FOUR_KB (65536)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit for a compact data set is 64 KiB, not 64 MiB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will fix.

#ifdef LOGGING
/**
Expand Down Expand Up @@ -763,8 +763,8 @@ nc_def_var_extra(int ncid, int varid, int *shuffle, int *deflate,
ndata *= var->dim[d]->len;

/* Ensure var is small enough to fit in compact storage. */
if (ndata * var->type_info->size > SIXTY_FOUR_MB)
return NC_EINVAL;
if (ndata * var->type_info->size > SIXTY_FOUR_KB)
return NC_EVARSIZE;

var->contiguous = NC_FALSE;
var->compact = NC_TRUE;
Expand Down
44 changes: 30 additions & 14 deletions nc_test4/tst_vars4.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@
#include "err_macros.h"

#define FILE_NAME "tst_vars4.nc"
#define NDIMS2 2
#define NDIM2 2
#define NUM_VARS 1
#define Y_NAME "y"
#define X_NAME "x"
#define Z_NAME "z"
#define VAR_NAME Y_NAME
#define XDIM_LEN 2
#define YDIM_LEN 5
#define ZDIM_LEN 8193
#define CLAIR "Clair"
#define JAMIE "Jamie"

Expand All @@ -26,7 +28,7 @@ main(int argc, char **argv)
printf("\n*** Testing netcdf-4 variable functions, even more.\n");
printf("**** testing Jeff's dimension problem...");
{
int varid, ncid, dims[NDIMS2], dims_in[NDIMS2];
int varid, ncid, dims[NDIM2], dims_in[NDIM2];
int ndims, nvars, ngatts, unlimdimid, natts;
char name_in[NC_MAX_NAME + 1];
nc_type type_in;
Expand All @@ -37,9 +39,9 @@ main(int argc, char **argv)
if (nc_def_dim(ncid, Y_NAME, YDIM_LEN, &dims[1])) ERR;
if (nc_def_var(ncid, VAR_NAME, NC_FLOAT, 2, dims, &varid)) ERR;
if (nc_inq(ncid, &ndims, &nvars, &ngatts, &unlimdimid)) ERR;
if (nvars != NUM_VARS || ndims != NDIMS2 || ngatts != 0 || unlimdimid != -1) ERR;
if (nvars != NUM_VARS || ndims != NDIM2 || ngatts != 0 || unlimdimid != -1) ERR;
if (nc_inq_var(ncid, 0, name_in, &type_in, &ndims, dims_in, &natts)) ERR;
if (strcmp(name_in, VAR_NAME) || type_in != NC_FLOAT || ndims != NDIMS2 ||
if (strcmp(name_in, VAR_NAME) || type_in != NC_FLOAT || ndims != NDIM2 ||
dims_in[0] != dims[0] || dims_in[1] != dims[1] || natts != 0) ERR;
if (nc_inq_dim(ncid, 0, name_in, &len_in)) ERR;
if (strcmp(name_in, X_NAME) || len_in != XDIM_LEN) ERR;
Expand All @@ -51,9 +53,9 @@ main(int argc, char **argv)
/* Open the file and check. */
if (nc_open(FILE_NAME, NC_WRITE, &ncid)) ERR;
if (nc_inq(ncid, &ndims, &nvars, &ngatts, &unlimdimid)) ERR;
if (nvars != NUM_VARS || ndims != NDIMS2 || ngatts != 0 || unlimdimid != -1) ERR;
if (nvars != NUM_VARS || ndims != NDIM2 || ngatts != 0 || unlimdimid != -1) ERR;
if (nc_inq_var(ncid, 0, name_in, &type_in, &ndims, dims_in, &natts)) ERR;
if (strcmp(name_in, VAR_NAME) || type_in != NC_FLOAT || ndims != NDIMS2 ||
if (strcmp(name_in, VAR_NAME) || type_in != NC_FLOAT || ndims != NDIM2 ||
dims_in[0] != dims[0] || dims_in[1] != dims[1] || natts != 0) ERR;
if (nc_inq_dim(ncid, 0, name_in, &len_in)) ERR;
if (strcmp(name_in, X_NAME) || len_in != XDIM_LEN) ERR;
Expand All @@ -65,9 +67,9 @@ main(int argc, char **argv)
SUMMARIZE_ERR;
printf("**** testing chunking turned on by fletcher...");
{
int varid, ncid, dims[NDIMS2];
int varid, ncid, dims[NDIM2];
int storage_in;
size_t chunksizes_in[NDIMS2];
size_t chunksizes_in[NDIM2];

if (nc_create(FILE_NAME, NC_NETCDF4 | NC_CLOBBER, &ncid)) ERR;
if (nc_def_dim(ncid, X_NAME, XDIM_LEN, &dims[0])) ERR;
Expand All @@ -87,9 +89,9 @@ main(int argc, char **argv)
SUMMARIZE_ERR;
printf("**** testing chunking turned on by shuffle...");
{
int varid, ncid, dims[NDIMS2];
int varid, ncid, dims[NDIM2];
int storage_in;
size_t chunksizes_in[NDIMS2];
size_t chunksizes_in[NDIM2];

if (nc_create(FILE_NAME, NC_NETCDF4 | NC_CLOBBER, &ncid)) ERR;
if (nc_def_dim(ncid, X_NAME, XDIM_LEN, &dims[0])) ERR;
Expand Down Expand Up @@ -227,7 +229,7 @@ main(int argc, char **argv)
SUMMARIZE_ERR;
printf("**** testing compact storage...");
{
int ncid, dimid, varid;
int ncid, dimid[NDIM2], varid, varid2;
int data[XDIM_LEN];
int x;

Expand All @@ -237,10 +239,22 @@ main(int argc, char **argv)

/* Create a file with one var with compact storage. */
if (nc_create(FILE_NAME, NC_NETCDF4|NC_CLOBBER, &ncid)) ERR;
if (nc_def_dim(ncid, X_NAME, XDIM_LEN, &dimid)) ERR;
if (nc_def_var(ncid, Y_NAME, NC_INT, 1, &dimid, &varid)) ERR;

/* Define dims. */
if (nc_def_dim(ncid, X_NAME, XDIM_LEN, &dimid[0])) ERR;
if (nc_def_dim(ncid, Z_NAME, ZDIM_LEN, &dimid[1])) ERR;

/* Define vars. */
if (nc_def_var(ncid, Y_NAME, NC_INT, 1, dimid, &varid)) ERR;
if (nc_def_var_chunking(ncid, varid, NC_COMPACT, NULL)) ERR;
if (nc_def_var(ncid, CLAIR, NC_INT, NDIM2, dimid, &varid2)) ERR;
/* This won't work, the var is too big for compact! */
if (nc_def_var_chunking(ncid, varid2, NC_COMPACT, NULL) != NC_EVARSIZE) ERR;

/* Write data. */
if (nc_put_var_int(ncid, varid, data)) ERR;

/* Close file. */
if (nc_close(ncid)) ERR;

/* Open the file and check it. */
Expand All @@ -250,9 +264,11 @@ main(int argc, char **argv)

if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR;
if (nc_inq(ncid, &ndims, &nvars, NULL, NULL)) ERR;
if (ndims != 1 || nvars != 1) ERR;
if (ndims != 2 || nvars != 2) ERR;
if (nc_inq_var_chunking(ncid, varid, &storage_in, NULL)) ERR;
if (storage_in != NC_COMPACT) ERR;
if (nc_inq_var_chunking(ncid, varid2, &storage_in, NULL)) ERR;
if (storage_in != NC_CONTIGUOUS) ERR;
if (nc_close(ncid)) ERR;
}
}
Expand Down