RFC: catch writes to protected memory in mmapped arrays (fix #3434) #6877

timholy · 2014-05-18T13:54:50Z

Starting from the example in #6703,

julia> s = open("/tmp/a.bin", "r")
IOStream(<file /tmp/a.bin>)

julia> m = mmap_array(Float64, (10,1), s)
10x1 Array{Float64,2}:
 0.751866
 0.832087
 0.239773
 0.120287
 0.867469
 0.957427
 0.941344
 0.642588
 0.56932 
 0.464532

julia> m[3] = 2.0
ERROR: MemoryError()
 in setindex! at array.jl:305

julia> m = 5
5

julia> gc()

If desired I'd be happy to change the error message to something more descriptive (e.g., "Memory-access error. Did you try to set the value of a read-only mmapped array?"). I seem to remember some discussion about problems that can arise from more verbose error messages, which is why I started with something minimal.

Summary of design: store begin/end pointers of blocks of memory that are write-protected; if you get a SIGSEGV, check the address to see if it falls within one of these protected pages. If it doesn't, use the default SIGSEGV handling.

I can't test this on Windows or OSX (and IIUC the AppVeyor testing is still not finished?), but there's relatively little of this that's OS-specific.

tkelman · 2014-05-18T20:13:57Z

IIUC the AppVeyor testing is still not finished?

It's more or less finished (~~I do need to tweak the binary download location~~ done) but runs so slowly that only a tiny fraction of the tests can finish within the time limit.

The base test coverage for mmap isn't catching a lot of problems that are occurring on Windows in various packages - HDF5, Requests, possibly others. I've been too busy to hunt down the problems and come up with reduced test cases.

timholy · 2014-05-19T09:59:54Z

Updated to incorporate Keno's suggestion. The net effect is no change to base/mmap.jl, and just a few lines in src/init.c. In case anyone else likes the original better, I didn't squash and do a forced update; barring objections to the new implementation, I'll do that before merging.

timholy · 2014-05-26T22:07:31Z

Closed by caa4bfa

timholy · 2014-05-26T23:29:05Z

Hmm, this may be causing Travis failures---the failures people are seeing (#6978) are in the same process that runs the file test. What's really odd is that (1) naturally I re-ran all the tests on my machine and they passed, and (2) the commit I pushed was just a rebase and squash of two commits, both of which passed Travis. I even restored this branch and rebuilt the Travis tests from back when they were two separate commits, just to check, and they are passing.

So this may not be the commit that is triggering the failures, but I am suspicious. Gotta run to dinner, but will check back on this later this evening.

timholy · 2014-05-28T09:46:43Z

More debugging. I took those same commits and re-committed them (without squashing) on top of a recent master. Even though the identical commits pass above, now I'm seeing test failures. This suggests either that something changed in base julia, or that failure is intermittent. EDIT: I should have pointed out that different Travis runs seem to fail for different reasons; in the linked example, one failed on the file test, the other completed file successfully and failed in the next test run in the same process.

Intermittent might suggest a memory problem, so I ran the test through valgrind (wow, is that slow). The only oddity compared to commenting out the

@test_throws MemoryError c[1] = 0   # read-only memory

line is

==6754== Warning: client switching stacks?  SP change: 0x9dacc10 --> 0x7feffe600
==6754==          to suppress, use: --max-stackframe=34177620464 or greater

That error message is (sortof) explained here.

I then wondered if this might also happen for our stack overflow detector. Sure enough, with bde82ca I got the same type of valgrind warning. That commit passed its Travis test, but on my own machine I had variable outcomes: several times it ran properly (for example, that time I ran it through valgrind), but once was able to crash/reboot my entire machine, and it looked like it was headed that way a second time before I killed the process. So we may still have some issues with our SEGV catching.

I'm afraid due to other commitments I need to stop working on this at least until the 2nd week of June, but I couldn't let this rest entirely without additional investigation.

timholy · 2014-07-08T21:47:36Z

OK, so a temporary account on Travis has let me test this more easily. Managed to trap a segfault:

/tmp/julia/share/julia/test$ gdb /tmp/julia/bin/julia-debug
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /tmp/julia/bin/julia-debug...done.
(gdb) run runtests.jl file
Starting program: /tmp/julia/bin/julia-debug runtests.jl file
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff28e7700 (LWP 19007)]
     * file
[New Thread 0x7fffeb82a700 (LWP 19010)]
[New Thread 0x7fffeae29700 (LWP 19011)]
[New Thread 0x7fffea428700 (LWP 19012)]
[New Thread 0x7fffe9a27700 (LWP 19013)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff51013f6 in ?? ()
(gdb) bt
#0  0x00007ffff51013f6 in ?? ()
#1  0x00007fffffffb8f0 in ?? ()
#2  0xe03fc30fb6401b00 in ?? ()
#3  0x00007fffffffb8b0 in ?? ()
#4  0x00007ffff51013c3 in ?? ()
#5  0x00007fffffffb8f0 in ?? ()
#6  0x00007ffff6ec2dfc in jl_apply (f=0x7ffff6efe1a4, args=0x7fffffffb8b0, 
    nargs=3762275087) at julia.h:975
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

Due to the stack corruption even that frame 6 looks rather useless (that's a lot of arguments!). Note that the file test here includes the

@test_throws MemoryError c[1] = 0  # read-only memory

line, which is what (intermittently) triggers this segfault.

One option I see is to commit this patch without including the test, so that at least we trap the error. That seems like an improvement over where we are now. We could expand the error message to say "Your session is probably corrupt, a restart is recommended."

As a point of reference, we've never had a test for the ability to catch stack overflows, either. When I add one, I intermittently get a similar error to what I'm experiencing here. So this plan would simply put the two on par.

timholy · 2014-07-23T21:53:30Z

Bump. #7708 reminded me that stack overflows are probably not in any better shape.

With this, at least we should usually get an error; if that error tells the user why, and that s/he probably needs to restart Julia due to stack corruption, at least that seems to be an improvement on a segfault.

If we want this for 0.3, it should be merged soon. Or just wait for 0.4. Let me know, and I'll remove the test, squash, and merge.

JeffBezanson · 2014-07-23T22:01:38Z

Yes, let's remove the test and merge.

timholy · 2014-07-24T00:05:49Z

Done.

quinnj · 2015-05-19T18:53:17Z

Hmmm......why can't I see any commits/changes with this merge? Even after restoring the branch?

quinnj · 2015-05-19T18:59:39Z

NM, found it

tkelman · 2015-06-13T00:32:15Z

Github (and/or Jacob) did something funky here. d4f8c31 does not look like the right commit, can someone find the right commit for this?

yuyichao · 2015-06-13T00:35:41Z

@tkelman caa4bfa ? #6877 (comment) ?

yuyichao · 2015-06-13T00:40:35Z

Or 3b36d85

These look like the only relevant commits from

git log --before='Jul 23 2014' --after='May 18 2014' --author=holy

tkelman · 2015-06-13T00:44:35Z

Ah thanks, yes, git log --grep=3434 helps, that first one was reverted in 87c3e38, then in July it was committed again but without the test.

timholy changed the title ~~Catch writes to protected memory in mmapped arrays (fix #3434)~~ RFC: catch writes to protected memory in mmapped arrays (fix #3434) May 19, 2014

timholy closed this May 26, 2014

timholy deleted the teh/mmapsegfault branch May 26, 2014 22:58

timholy restored the teh/mmapsegfault branch May 26, 2014 23:15

simonster mentioned this pull request May 27, 2014

Make splice!(::BitVector, ::Integer, ::BitVector) return a Bool #6979

Merged

timholy reopened this May 28, 2014

timholy mentioned this pull request Jul 2, 2014

More instrumentation (memory allocation, also fixes #7259) #7464

Merged

timholy merged commit d4f8c31 into master Jul 23, 2014

timholy deleted the teh/mmapsegfault branch July 24, 2014 00:05

timholy mentioned this pull request May 19, 2015

mmap_array segfaults #11351

Closed

quinnj restored the teh/mmapsegfault branch May 19, 2015 18:52

quinnj deleted the teh/mmapsegfault branch May 19, 2015 18:52

quinnj restored the teh/mmapsegfault branch May 19, 2015 18:52

quinnj deleted the teh/mmapsegfault branch May 19, 2015 18:52

timholy mentioned this pull request May 29, 2015

RFC: mmap makeover #11280

Merged

timholy mentioned this pull request Jun 12, 2015

Corruption caused by ReadOnlyMemoryError test #11691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: catch writes to protected memory in mmapped arrays (fix #3434) #6877

RFC: catch writes to protected memory in mmapped arrays (fix #3434) #6877

timholy commented May 18, 2014

tkelman commented May 18, 2014

timholy commented May 19, 2014

timholy commented May 26, 2014

timholy commented May 26, 2014

timholy commented May 28, 2014

timholy commented Jul 8, 2014

timholy commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

timholy commented Jul 24, 2014

quinnj commented May 19, 2015

quinnj commented May 19, 2015

tkelman commented Jun 13, 2015

yuyichao commented Jun 13, 2015

yuyichao commented Jun 13, 2015

tkelman commented Jun 13, 2015

RFC: catch writes to protected memory in mmapped arrays (fix #3434) #6877

RFC: catch writes to protected memory in mmapped arrays (fix #3434) #6877

Conversation

timholy commented May 18, 2014

tkelman commented May 18, 2014

timholy commented May 19, 2014

timholy commented May 26, 2014

timholy commented May 26, 2014

timholy commented May 28, 2014

timholy commented Jul 8, 2014

timholy commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

timholy commented Jul 24, 2014

quinnj commented May 19, 2015

quinnj commented May 19, 2015

tkelman commented Jun 13, 2015

yuyichao commented Jun 13, 2015

yuyichao commented Jun 13, 2015

tkelman commented Jun 13, 2015