-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetCDF should reject hdf5 files with cycles of symbolic links #320
Comments
Currently, netcdf-c explicitly will not accept and HDF5 file |
Thanks for your reply. That makes total sense. In our case, we use heuristics such as file name or opening the file via a backend to check it looks like a NetCDF file. We just wanted to make sure that you were aware that when faced with such a file the API does not fail gracefully. |
It may be worth noting that the crash is in the form of a segmentation fault, so this is a possible security/DOS issue. |
Can you send me your test file that causes a crash? |
Is the test script in the issue not sufficient? It’s been a couple years now and we may not be able to track down the original problematic file. |
We should probably figure out where this seg fault is occurring. |
I will come up with a test file, probably tomorrow, and start to explore this issue. I will see if I can reproduce the seg fault. My first thought for a solution is that I can look at the objinfo and get a unique set of integers that identify a group. If I do that for each group, I should be able to tell that I am in the same group again. In that way, I can detect loops. We can just error at that point or, if we can figure out what to do that is useful, we can do that. First I will see if I can detect that it is happening, and I would like to ensure that checking it does not impact performance, since I've been working hard to increase performance and reduce what the library does when opening a file. Most files do not have loops, and I don't want to kill performance for everyone checking for loops. |
I came up with a test file, and current master errors out when I try to open it, just as it should. So what is the bug? There does not seem to be a long delay for me - HDF5 errors out right away. Here's the file I'm creating in HDF5:
I created this file with the following code (in tst_files6.c):
When I open this file in netcdf, nc_open returns an error (NC_EHDFERR), the HDF5 error stack (lengthy) looks like this:
So is there an issue here? Was this with an older version of HDF5? Perhaps HDF5 improvements have rendered this issue moot? |
On python3.7, using h5py 2.8.0 (not sure what the underlying hdf5 version is), numpy 1.15.0, and netCDF4 1.4.0 running the script @Yurlungur attached in the initial report consumes ~25 GB of RAM on my MacOS 10.13 system. It doesn't crash but it does cause my system to start hitting the page file pretty hard and uses all available RAM. |
Ah, here we are:
|
I have tested with both 1.10.1 and 1.10.2, and in both cases it rejects my circularly linked file with NC_EHDFERR. There is no detectable difference in time to execute with or without the statement that attempts to open the file. I have a test for this which will be added to the code base when it gets merged, so we can detect if this fails in the future, or in other circumstances (i.e. one of the many CI test variants). |
Since detecting cycles is O(V+E), you are unlikely to see any major performance |
OK, good point, but the sample given to recreate the problem uses only a couple of groups. @Yurlungur did the problems only occur on files with lots of groups? Do you have any sample files on which a problem occurred? |
@edhartnett I can try to track down the original problem file for you, but it's been many years since I encountered this problem. The original problem indeed had a large number of groups. However, as @ngoldbaum points out, the test script I provided, which generates a file, caused problems for me. And that file produces only four groups. |
Thank you for looking into this, by the way! |
Hello,
I am contributing to a project which reads files in a large number of formats, including NetCDF and a format built on hdf5 which contains cycles of symbolic links. While working on integrating all these formats, we noticed that we can feed this hdf5 format into the NetCDF API and cause a crash after a long delay.
I would like to suggest adding some heuristics for the hdf5 backend that check whether or not the hdf5 file is in fact a NetCDF file.
(I note that I have only tried this with the Python API.)
Thanks for your time.
Environment Information
configure
)C
code to recreate the issue?Steps to reproduce the behavior
The following Python script creates an hdf5 file and then reads it into NetCDF, reproducing the problem.
The text was updated successfully, but these errors were encountered: