-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfaults with 3.1.0rc2 #2458
Comments
There have been a few . 095bc42 comes to mind, but they are probably others. |
I'll keep poking around to try and pin down any pattern about the tests where it's crashing. |
I reverted to the version of the build script which gave me a working 3.0.4 build, and only changed the GDAL version to 3.1.0rc2. I've not experienced any error since going back to that earlier script. That suggests to me that it's the version of a specific dependency, or the presence/absence of a configuration flag that causes the issue - and (hopefully) not a general issue with GDAL. I'll keep working through the changes made, though it takes a while to rebuild each time. |
any update ? |
I have still very occasionally seen segfaults, but no luck in tracking them down. They seem less common with the original build dependencies, but still there. I'm also entertaining the possibility that they're in our Python code - as you know, unexpected segfaults are a 'feature' of the Python bindings if explicit references aren't kept to every object in a chain. It could be that GDAL 3.1.0 is exposing some issues/race conditions that already existed, but weren't causing problems. |
I doubt so. From your stack trace, is it possible that your Python code uses Python multiprocessing forking ? The crash likely occurs in the proj_context_get_database_path() call added in 095bc42 to fix #2221 . Could you try to revert it and see if it makes a difference ? |
Yes, we do use multiprocessing - I'll try reverting that change and see what happens. If it matters at all, the method we use to create child processes is spawning, rather than forking. We've found that libraries such as rtree/libspatialindex don't play well with forking, due to the shared references that result. |
spawning should be fine. But if the code path I underlined above is triggered, it should be following a fork() not a spawn |
As yet, I've not had much luck trying to revert. Simply doing I have tried recompiling the SWIG Python bindings (using SWIG 4.0.1) as part of the compilation - i.e. I've pushed the code as it is after reversion to: https://github.com/DanielFEvans/gdal/tree/revert_fork_change Any pointers on where I might need to go hunting for the problem? |
Normally just reverting should be fine. This should have no impact on the Python bindings, and you wouldn't need to regenerate them. Are you sure you build works fine (basic use of the Python bindings) on the stock 3.1 release ? |
Always good to ask those questions. It turns out I'd not updated a couple of paths to account for building from a git clone, rather than the release source archive. So far, the indication is that reverting that change has stopped the segfaults - running our software test suite repeatedly for an hour worked, while trying the same with the previous build resulted in a segfault after about half an hour. However, since the problems are intermittent, I'm not completely convinced yet, and will keep an eye out for any further issues. |
Could you possibly do a CXXFLAGS="-g -fsanitize=address" build of PROJ and GDAL with 095bc42 applied , so there's better diagnostics when it crashes ? |
FYI, it seems that this problem no longer exists as of GDAL 3.1.2. Apologies that I never got back to you on -fsanitize=address - I never worked out how to get a fully working build using it. |
I suspect this was fixed per #2746 |
Having compiled GDAL 3.1.0rc2 into a Python wheel, and attempted to run our software tests, I am seeing occasional (and, annoyingly, unpredictable) segmentation faults.
The backtrace output from one such issue is included below, and the structure of the trace always seems to include the one mention of libproj, followed by the list of GDAL calls.
As PROJ being pointed to by the trace, I've tried building with PROJ v6.2.1 (version used for my previous GDAL 3.0.4 build) and v6.3.2, but both have shown the issue.
This could indicate a compilation issue on my side, but I'm wondering if there has been a change in the GDAL interface to PROJ that might be causing problems?
Operating system
Scientific Linux 7.6
GDAL version and provenance
3.1.0rc2, compiled locally via scripts at https://github.com/DanielFEvans/gdalmanylinux/tree/gdal_3.1.0
The text was updated successfully, but these errors were encountered: