-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: catch writes to protected memory in mmapped arrays (fix #3434) #6877
Conversation
It's more or less finished ( The base test coverage for mmap isn't catching a lot of problems that are occurring on Windows in various packages - HDF5, Requests, possibly others. I've been too busy to hunt down the problems and come up with reduced test cases. |
Updated to incorporate Keno's suggestion. The net effect is no change to |
Closed by caa4bfa |
Hmm, this may be causing Travis failures---the failures people are seeing (#6978) are in the same process that runs the So this may not be the commit that is triggering the failures, but I am suspicious. Gotta run to dinner, but will check back on this later this evening. |
More debugging. I took those same commits and re-committed them (without squashing) on top of a recent master. Even though the identical commits pass above, now I'm seeing test failures. This suggests either that something changed in base julia, or that failure is intermittent. EDIT: I should have pointed out that different Travis runs seem to fail for different reasons; in the linked example, one failed on the Intermittent might suggest a memory problem, so I ran the test through valgrind (wow, is that slow). The only oddity compared to commenting out the
line is
That error message is (sortof) explained here. I then wondered if this might also happen for our stack overflow detector. Sure enough, with bde82ca I got the same type of valgrind warning. That commit passed its Travis test, but on my own machine I had variable outcomes: several times it ran properly (for example, that time I ran it through valgrind), but once was able to crash/reboot my entire machine, and it looked like it was headed that way a second time before I killed the process. So we may still have some issues with our SEGV catching. I'm afraid due to other commitments I need to stop working on this at least until the 2nd week of June, but I couldn't let this rest entirely without additional investigation. |
OK, so a temporary account on Travis has let me test this more easily. Managed to trap a segfault:
Due to the stack corruption even that frame 6 looks rather useless (that's a lot of arguments!). Note that the
line, which is what (intermittently) triggers this segfault. One option I see is to commit this patch without including the test, so that at least we trap the error. That seems like an improvement over where we are now. We could expand the error message to say "Your session is probably corrupt, a restart is recommended." As a point of reference, we've never had a test for the ability to catch stack overflows, either. When I add one, I intermittently get a similar error to what I'm experiencing here. So this plan would simply put the two on par. |
Bump. #7708 reminded me that stack overflows are probably not in any better shape. With this, at least we should usually get an error; if that error tells the user why, and that s/he probably needs to restart Julia due to stack corruption, at least that seems to be an improvement on a segfault. If we want this for 0.3, it should be merged soon. Or just wait for 0.4. Let me know, and I'll remove the test, squash, and merge. |
Yes, let's remove the test and merge. |
Done. |
Hmmm......why can't I see any commits/changes with this merge? Even after restoring the branch? |
NM, found it |
Github (and/or Jacob) did something funky here. d4f8c31 does not look like the right commit, can someone find the right commit for this? |
Or 3b36d85 These look like the only relevant commits from
|
Ah thanks, yes, |
Starting from the example in #6703,
If desired I'd be happy to change the error message to something more descriptive (e.g.,
"Memory-access error. Did you try to set the value of a read-only mmapped array?"
). I seem to remember some discussion about problems that can arise from more verbose error messages, which is why I started with something minimal.Summary of design: store begin/end pointers of blocks of memory that are write-protected; if you get a SIGSEGV, check the address to see if it falls within one of these protected pages. If it doesn't, use the default SIGSEGV handling.
I can't test this on Windows or OSX (and IIUC the AppVeyor testing is still not finished?), but there's relatively little of this that's OS-specific.