Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading HDF5 virtual datasets #1828

Merged
merged 10 commits into from
Jul 29, 2021
11 changes: 7 additions & 4 deletions include/netcdf.h
Original file line number Diff line number Diff line change
Expand Up @@ -293,11 +293,14 @@ NOTE: The NC_MAX_DIMS, NC_MAX_ATTRS, and NC_MAX_VARS limits

/** In HDF5 files you can set storage for each variable to be either
* contiguous or chunked, with nc_def_var_chunking(). This define is
* used there. */
* used there. Unknown storage is used for further extensions of HDF5
* storage models, which should be handled transparently by netcdf */
/**@{*/
#define NC_CHUNKED 0
#define NC_CONTIGUOUS 1
#define NC_COMPACT 2
#define NC_CHUNKED 0
#define NC_CONTIGUOUS 1
#define NC_COMPACT 2
#define NC_UNKNOWN_STORAGE 3
d70-t marked this conversation as resolved.
Show resolved Hide resolved
#define NC_VIRTUAL 4
/**@}*/

/** In HDF5 files you can set check-summing for each variable.
Expand Down
7 changes: 4 additions & 3 deletions libhdf5/hdf5create.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,12 @@ nc4_create_file(const char *path, int cmode, size_t initialsz,
}

/* Need this access plist to control how HDF5 handles open objects
* on file close. Setting H5F_CLOSE_SEMI will cause H5Fclose to
* fail if there are any open objects in the file. */
* on file close. (Setting H5F_CLOSE_WEAK will cause H5Fclose not to
* fail if there are any open objects in the file. This may happen when virtual
* datasets are opened). */
if ((fapl_id = H5Pcreate(H5P_FILE_ACCESS)) < 0)
BAIL(NC_EHDFERR);
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_SEMI))
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_WEAK))
BAIL(NC_EHDFERR);

#ifdef USE_PARALLEL4
Expand Down
17 changes: 14 additions & 3 deletions libhdf5/hdf5open.c
Original file line number Diff line number Diff line change
Expand Up @@ -756,12 +756,13 @@ nc4_open_file(const char *path, int mode, void* parameters, int ncid)
#endif /* !USE_PARALLEL4 */

/* Need this access plist to control how HDF5 handles open objects
* on file close. (Setting H5F_CLOSE_SEMI will cause H5Fclose to
* fail if there are any open objects in the file). */
* on file close. (Setting H5F_CLOSE_WEAK will cause H5Fclose not to
* fail if there are any open objects in the file. This may happen when virtual
* datasets are opened). */
if ((fapl_id = H5Pcreate(H5P_FILE_ACCESS)) < 0)
BAIL(NC_EHDFERR);

if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_SEMI) < 0)
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_WEAK) < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a somewhat drastic step. Can we discuss?

Can you tell me more about why a virtual file might have objects left open?

Copy link
Contributor Author

@d70-t d70-t Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the step also a bit too drastic. I don't really know about how the thought process in HDF5 is regarding the implementation of virtual datasets. But if you have a look at the links above, this line opens a new reference to the linked file and almost immediately after that the file is closed again, but the previous reference is not released. This leads to an error if fclose_degree is SEMI. Also as the corresponding function is called H5D__virtual_open_source_dset, I assumed that at the end of the function something should be left in an open state... The above change was a way to get rid of the error without having to dig further into HDF5 and as h5netcdf could handle this, I thought there should be a way to handle it without changing HDF5.

But yes, I'd be happy to find a different solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find a different solution for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I have currently no clue how to do that differently without changing the virtual dataset implementation of HDF5. Maybe we would have to reach out to the HDF5 developers and ask them about their intention on this option. But I can't start that right now, as I'll be off for a week in a couple of hours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sent a request to HDFgroup to clarify if the need for H5F_CLOSE_WEAK is intended behaviour or if I am missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to briefly get back on this. It turns out that this behaviour is indeed a bug in HDF5. I just got the confirmation that there's a fix almost ready which should be released soon. This fix however effectively will force source datasets to be opened as WEAK internally.

This raises the question of how to proceed in this PR?

  • Should there be a backwards-compatible solution which runs with current HDF5 by setting file opening WEAK outside of HDF5?
  • Should there be a requirement to use HDF5 >= the upcoming release?

BAIL(NC_EHDFERR);

#ifdef USE_PARALLEL4
Expand Down Expand Up @@ -1140,6 +1141,16 @@ get_chunking_info(hid_t propid, NC_VAR_INFO_T *var)
{
var->storage = NC_COMPACT;
}
#ifdef H5D_VIRTUAL
else if (layout == H5D_VIRTUAL)
{
var->storage = NC_VIRTUAL;
}
#endif
else
{
var->storage = NC_UNKNOWN_STORAGE;
}

return NC_NOERR;
}
Expand Down
6 changes: 5 additions & 1 deletion libsrc4/nc4internal.c
Original file line number Diff line number Diff line change
Expand Up @@ -1709,8 +1709,12 @@ rec_print_metadata(NC_GRP_INFO_T *grp, int tab_count)
strcat(storage_str, "contiguous");
else if (var->storage == NC_COMPACT)
strcat(storage_str, "compact");
else
else if (var->storage == NC_CHUNKED)
strcat(storage_str, "chunked");
else if (var->storage == NC_VIRTUAL)
strcat(storage_str, "virtual");
else
strcat(storage_str, "unknown");
LOG((2, "%s VARIABLE - varid: %d name: %s ndims: %d "
"dimids:%s storage: %s", tabs, var->hdr.id, var->hdr.name,
var->ndims,
Expand Down
2 changes: 1 addition & 1 deletion nc_perf/tst_attsperf.c
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ readfile_hdf5(char *file_name, long long *delta, int do_inq, int num_vars)

/* Open and close the root group. */
if ((fapl_id = H5Pcreate(H5P_FILE_ACCESS)) < 0) ERR;
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_SEMI)) ERR;
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_WEAK)) ERR;
if ((hdfid = H5Fopen(file_name, H5F_ACC_RDONLY, fapl_id)) < 0) ERR;
if ((hdf_grpid = H5Gopen2(hdfid, "/", H5P_DEFAULT)) < 0) ERR;

Expand Down
4 changes: 4 additions & 0 deletions nc_test4/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ ENDIF(ENABLE_FILTER_TESTING)

ENDIF(BUILD_UTILITIES)

IF(${HDF5_VERSION} VERSION_GREATER "1.10.0")
SET(NC4_TESTS ${NC4_TESTS} tst_virtual_datasets)
ENDIF(${HDF5_VERSION} VERSION_GREATER "1.10.0")

##
# The shell script, run_empty_vlen_test.sh,
# depends on the 'tst_empty_vlen_unlim' binary.
Expand Down
2 changes: 1 addition & 1 deletion nc_test4/tst_h_vl2.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ main()

/* Open the file and read the vlen data. */
if ((fapl_id = H5Pcreate(H5P_FILE_ACCESS)) < 0) ERR;
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_SEMI)) ERR;
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_WEAK)) ERR;
if (H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST) < 0) ERR;

if ((fileid = H5Fopen(FILE_NAME, H5F_ACC_RDONLY, fapl_id)) < 0) ERR;
Expand Down
2 changes: 1 addition & 1 deletion nc_test4/tst_interops.c
Original file line number Diff line number Diff line change
Expand Up @@ -390,7 +390,7 @@ main(int argc, char **argv)
/* Open the file with HDF5 while netcdf still has it open. */
if ((fapl_id = H5Pcreate(H5P_FILE_ACCESS)) < 0) ERR;
/* Turn this off for*/
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_SEMI)) ERR;
if (H5Pset_fclose_degree(fapl_id, H5F_CLOSE_WEAK)) ERR;
if ((fileid = H5Fopen(FILE_NAME, H5F_ACC_RDONLY, fapl_id)) < 0) ERR;
if (H5Pclose(fapl_id) < 0) ERR;
if (H5Fclose(fileid) < 0) ERR;
Expand Down
4 changes: 2 additions & 2 deletions nc_test4/tst_interops5.c
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ main(int argc, char **argv)
yscaleDims[0] = ncolCur;
if ((xdimSpaceId = H5Screate_simple(1, xscaleDims, NULL)) < 0) ERR;

/* With the SEMI close degree, the HDF5 file close will fail if
/* With the WEAK close degree, the HDF5 file close will not fail if
* anything is left open. */
if ((fapl = H5Pcreate(H5P_FILE_ACCESS)) < 0) ERR;
if (H5Pset_fclose_degree(fapl, H5F_CLOSE_SEMI)) ERR;
if (H5Pset_fclose_degree(fapl, H5F_CLOSE_WEAK)) ERR;

/* Create file */
if((fileId = H5Fcreate(FILE_NAME, H5F_ACC_TRUNC,
Expand Down
87 changes: 87 additions & 0 deletions nc_test4/tst_virtual_datasets.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#include <hdf5.h>
#include <hdf5_hl.h>
#include <nc_tests.h>
#include "err_macros.h"

#define FILE_NAME_A "tst_virtual_a.nc"
#define FILE_NAME_B "tst_virtual_b.nc"
#define VARIABLE_SIZE 5
#define VARIABLE_NAME "v"

static void
create_dataset_a() {
hsize_t dims[1] = {VARIABLE_SIZE};
float data[VARIABLE_SIZE];
for(size_t i = 0; i < VARIABLE_SIZE; ++i) {
data[i] = (float)i;
}

hid_t file = H5Fcreate(FILE_NAME_A, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
hid_t space = H5Screate_simple(1, dims, NULL);
hid_t dset = H5Dcreate2(file, VARIABLE_NAME, H5T_IEEE_F32LE, space, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

H5Dwrite(dset, H5T_IEEE_F32LE, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);
H5DSset_scale(dset, VARIABLE_NAME);

H5Dclose(dset);
H5Sclose(space);
H5Fclose(file);
}

static void
create_dataset_b() {
hsize_t dims[1] = {VARIABLE_SIZE};

hid_t file = H5Fcreate(FILE_NAME_B, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
hid_t src_space = H5Screate_simple(1, dims, NULL);
hid_t space = H5Screate_simple(1, dims, NULL);

hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_virtual(dcpl, space, FILE_NAME_A, VARIABLE_NAME, src_space);
hid_t dset = H5Dcreate2(file, VARIABLE_NAME, H5T_IEEE_F32LE, space, H5P_DEFAULT, dcpl, H5P_DEFAULT);

H5DSset_scale(dset, VARIABLE_NAME);

H5Dclose(dset);
H5Pclose(dcpl);
H5Sclose(space);
H5Sclose(src_space);
H5Fclose(file);
}

int read_back_contents(const char* filename) {
int ncid, varid, ndims;
float data[VARIABLE_SIZE];

char var_name[NC_MAX_NAME+1];
int dimids_var[1], var_type, natts;

printf("\treading back %s.\n", filename);

if (nc_open(filename, 0, &ncid)) ERR;
if (nc_inq_varid(ncid, VARIABLE_NAME, &varid)) ERR;
if (nc_inq_var(ncid, varid, var_name, &var_type, &ndims, dimids_var, &natts)) ERR;
if (nc_get_var_float(ncid, varid, data)) ERR;
printf("\t %s: ", var_name);
for (size_t i = 0; i < VARIABLE_SIZE; ++i) {
printf("%f, ", data[i]);
}
printf("\n");
if (nc_close(ncid)) ERR;
SUMMARIZE_ERR;
return 0;
}

int
main(int argc, char **argv)
{

printf("\n*** Testing reading of virtual datasets.\n");

create_dataset_a();
create_dataset_b();

int status;
if((status = read_back_contents(FILE_NAME_A))) return status;
if((status = read_back_contents(FILE_NAME_B))) return status;
}
8 changes: 7 additions & 1 deletion ncdump/ncdump.c
Original file line number Diff line number Diff line change
Expand Up @@ -988,7 +988,7 @@ pr_att_specials(
} else if(contig == NC_COMPACT) {
pr_att_name(ncid, varp->name, NC_ATT_STORAGE);
printf(" = \"compact\" ;\n");
} else {
} else if(contig == NC_CHUNKED) {
size_t *chunkp;
int i;
pr_att_name(ncid, varp->name, NC_ATT_STORAGE);
Expand All @@ -1002,6 +1002,12 @@ pr_att_specials(
printf("%lu%s", (unsigned long)chunkp[i], i+1 < varp->ndims ? ", " : " ;\n");
}
free(chunkp);
} else if(contig == NC_VIRTUAL) {
pr_att_name(ncid, varp->name, NC_ATT_STORAGE);
printf(" = \"virtual\" ;\n");
} else {
pr_att_name(ncid, varp->name, NC_ATT_STORAGE);
printf(" = \"unknown\" ;\n");
}

/* _Filter (including deflate and shuffle) */
Expand Down