-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault while reading virtual datasets #1799
Comments
I did another test with a freshly compiled master version of netcdf. I get the same segmentation fault and can confirm that there is indeed a null pointer access on
The data must be available in the dataset tough as Do you have a suggestion on how to fix this issue in a proper way? |
The best way forward would be a C test, as simple as possible, which
demonstrates the problem.
…On Wed, Jul 22, 2020 at 6:32 AM d70-t ***@***.***> wrote:
I did another test with a freshly compiled master version of netcdf. I get
the same segmentation fault and can confirm that there is indeed a null
pointer access on chunksizes. I did a very crude fix in d70-t/netcdf-c@
2e6a342
<d70-t@2e6a342>
. This lets ncdump -h run without complaints, but actually accessing the
variable with ncdump -v time still creates an error:
NetCDF: HDF error
Location: file .../netcdf-c/ncdump/vardata.c; line 478
time = Segmentation fault (core dumped)
The data must be available in the dataset tough as h5netcdf is again
happy with it.
Do you have a suggestion on how to fix this issue in a proper way?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1799 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCSXXATXPLHLR6V7HJN3UDR43L67ANCNFSM4PEUHWXA>
.
|
I did not come up with a C example yet, but I've got some simple python code which generates two files. Running
Running
But running
Here's the script: import numpy as np
import h5py
def _main():
# create a simple netcdf compatible dataset
a = h5py.File("a.nc", "w")
var = a.create_dataset("v", data=np.arange(5, dtype="f4"))
var.make_scale("v")
a.close()
# create a dataset which refers to the former one
b = h5py.File("b.nc", "w")
layout = h5py.VirtualLayout(shape=(5,), dtype="f4", maxshape=(5,))
layout[:] = h5py.VirtualSource("a.nc", "v", shape=(5,))
var = b.create_virtual_dataset("v", layout)
var.make_scale("v")
b.close()
if __name__ == "__main__":
_main() And the two generated files for reference: virtual_datasets.zip |
Well since netcdf-c is C and not python, python tests cannot be included in the library, so do me little good. ;-) The first step in solving this remains to translate your simple test into C so that it can be included in the test directory nc_test4. |
I'll have a look at the test directory and see what I can do. But that'll take me a while. |
In d70-t/netcdf-c@1a7dd23, I've added a test which triggers the same segmentation fault. |
It seems like it is part of the design of HDF5 virtual datasets that objects within a file remain opened while the files is aready "closed". Setting the fclose degree to SEMI would cause the library to bail out. This commit makes nc_test4/tst_virtual_dataset succeed. See also Unidata#1799
I am trying to create a netCDF4 compatible dataset which is composed of several different sources. My issue is that the source files may not be changed and are too large to afford making copies of the data. As a consequence, I decided to create new datasets using the HDF5 external and virtual datasets API such that most of my original data can stay in place while creating a new, more user friendly and higher level view to the original data. The first step was to convert a bunch of non-netcdf binary files to a single "virtual" netcdf file using external storage of HDF5, which works great. In the second step, which in my case is changing some coordinate variables and replacing broken data with new data, I tried to use the virtual dataset feature.
This leads to a segmentation fault when opening the datasets using netcdf4 (i.e.
ncdump -h
). The segfault is innc4_adjust_var_cache
and I assume that this is related toget_chunking_info
, which misses a case forH5D_VIRTUAL
which may be returned by theH5Pget_layout
function. I've not yet tested it further but my guess is thatvar->storage
in this case defaults to 0 which isNC_CHUNKED
butvar->chunksizes
is not allocated, but later read bync4_adjust_var_cache
.Notably, opening the dataset using
h5netcdf
works as expected.I tested this on Ubuntu 18.04.4 with netcdf version 4.6.0. But as the missing case is still present in master, I assume the error will show up there as well.
The text was updated successfully, but these errors were encountered: