Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in filepath Dataset method on some platforms #506

Open
jswhit opened this issue Jan 8, 2016 · 9 comments
Open

segfault in filepath Dataset method on some platforms #506

jswhit opened this issue Jan 8, 2016 · 9 comments

Comments

@jswhit
Copy link
Collaborator

jswhit commented Jan 8, 2016

reported by @ocefpaf, see conda-forge/staged-recipes#43

Can be triggered (on some platforms) with

import netCDF4
url = 'http://test.opendap.org:80/opendap/data/ncml/sample_virtual_dataset.ncml'
nc = netCDF4.Dataset(url)
assert nc.filepath() == url  # segfault here with version 1.2.2

Reverting filepath to the version in 1.2.1 with this patch fixes it

--- a/netCDF4/_netCDF4.pyx
+++ b/netCDF4/_netCDF4.pyx
@@ -1854,12 +1854,9 @@ Get the file system path (or the opendap URL) which was used to
 open/create the Dataset. Requires netcdf >= 4.1.2"""
         cdef int ierr
         cdef size_t pathlen
-        cdef char *path
+        cdef char path[NC_MAX_NAME + 1]
         IF HAS_NC_INQ_PATH:
             with nogil:
-                ierr = nc_inq_path(self._grpid, &pathlen, NULL)
-            path = <char *>malloc(sizeof(char) * pathlen)
-            with nogil:
                 ierr = nc_inq_path(self._grpid, &pathlen, path)
             return path.decode('ascii')
         ELSE:

However, I would rather not to do this since it will truncate paths longer than NC_MAX_NAME+1.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jan 8, 2016

@jswhit bare in mind that the issue surfaces under a very specific condition: packaging with conda using while using an old GLIBC (CentOS6). I cannot reproduce that problem in a modern Linux distro (recent GLIBC).

(The reason why we build conda packages with an old CentOS is to ensure compatibility across all GLIBC versions.)

@jswhit
Copy link
Collaborator Author

jswhit commented Jan 8, 2016

@ocefpaf, would you mind printing out the value of pathlen, right after the first call to nc_inq_path?

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jan 8, 2016

pathlen seems OK 72

('http://test.opendap.org:80/opendap/data/ncml/sample_virtual_dataset.ncml' does have 72 characters.)

@jswhit
Copy link
Collaborator Author

jswhit commented Jan 9, 2016

@ocefpaf: I've updated netCDF4.pxi and _netCDF4.pyx to use malloc and free imported from cython (which I believe allocates memory on the python heap) instead of using the libc version directly (pull request #507). I've also added some exception handling to allow it to fail more gracefully if it can't allocate the memory for the filepath pointer. Hopefully this helps.

@jswhit
Copy link
Collaborator Author

jswhit commented Jan 12, 2016

according to @ocefpaf, the segfault still occurs on CentOS6. I'm out of ideas at this point.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jan 12, 2016

Sorry I cannot by of more help. I do not know the first thing about C and cython.

Here are the Linux build logs:

https://circleci.com/gh/conda-forge/staged-recipes/151

@pelson any ideas?

@pelson
Copy link

pelson commented Jan 12, 2016

@pelson any ideas?

Might make sense to be able to reproduce quickly with just a docker image and a few commands...

There is nothing worse than not being able to recreate an issue yourself, and it makes sense to be able to use pdb etc to track things down.

@jswhit
Copy link
Collaborator Author

jswhit commented Jan 15, 2016

I'm going to go ahead and merge the pull request, since it should prevent memory leakage, even if it doesn't fix the segfaults. Seems likely that the segfault is coming from either a bug in the netcdf c-lib, or the old glibc.

jswhit added a commit that referenced this issue Jan 15, 2016
@ocefpaf
Copy link
Collaborator

ocefpaf commented Jan 18, 2016

I was putting together a Dockerfile to run the tests using CircleCI and I noticed that:

I have no idea what is going on! Also, at least so far, the problem seems to be relate only to the URL I used for testing.

Thanks @jswhit for looking into this. I believe it is not worth try to fix this unless we find more failing cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants