Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MADV_DONTNEED causing lots of minor page faults #31885

Closed
keeperofdakeys opened this issue Feb 25, 2016 · 9 comments
Closed

MADV_DONTNEED causing lots of minor page faults #31885

keeperofdakeys opened this issue Feb 25, 2016 · 9 comments

Comments

@keeperofdakeys
Copy link
Contributor

I've got a program that is experiencing lots minor of page faults (>12000 in >100 ms runtime), doubling its runtime. I suspect the issue is because of excessive madvise(..., ..., MADV_DONTNEED) calls. The following excerpts from perf stat, strace output, and perf trace output show the evidence. The behaviour occurs on a nightly from one week ago, and the current stable. However I can't replicate this on an older beta I have (rustc 1.7.0-beta.3), where no such madvise calls, or minor page faults occur.

I can't find any madvise calls in the rust source code, so I'm a bit stumped in how to further investigate this issue.

perf stat output:

         32.770167      task-clock (msec)         #    0.446 CPUs utilized          
                 4      context-switches          #    0.122 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            12,904      page-faults               #    0.394 M/sec                  
       104,812,609      cycles                    #    3.198 GHz                    
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
       128,721,915      instructions              #    1.23  insns per cycle        
        25,328,091      branches                  #  772.901 M/sec                  
           182,971      branch-misses             #    0.72% of all branches        

       0.073417400 seconds time elapsed

strace output:

madvise(0x7fb41ec6c000, 65536, MADV_DONTNEED) = 0
open("/proc/19/stat", O_RDONLY|O_CLOEXEC) = 4
ioctl(4, FIOCLEX)                       = 0
read(4, "19 (ksoftirqd/3) S 2 0 0 0 -1 69"..., 65536) = 155
read(4, "", 65536)                      = 0
close(4)                                = 0
madvise(0x7fb41ec6c000, 65536, MADV_DONTNEED) = 0
open("/proc/19/status", O_RDONLY|O_CLOEXEC) = 4
ioctl(4, FIOCLEX)                       = 0
read(4, "Name:\tksoftirqd/3\nState:\tS (slee"..., 65536) = 606
read(4, "", 65536)                      = 0
close(4)                                = 0
madvise(0x7fb41ec6c000, 65536, MADV_DONTNEED) = 0
open("/proc/19/cmdline", O_RDONLY|O_CLOEXEC) = 4
ioctl(4, FIOCLEX)                       = 0
read(4, "", 65536)                      = 0
close(4)                                = 0

perf trace output:

    12.150 ( 0.003 ms): psq/11143 read(arg0: 4, arg1: 140038575144960, arg2: 65536, arg3: 140038578897296, arg4: 0, arg5: 7976) = 155
    12.153 ( 0.001 ms): psq/11143 read(arg0: 4, arg1: 140038575144960, arg2: 65536, arg3: 140038578897296, arg4: 0, arg5: 0) = 0
    12.155 ( 0.001 ms): psq/11143 close(arg0: 4, arg1: 256, arg2: 140731978476463, arg3: 140038578897296, arg4: 0, arg5: 0) = 0
    12.160 ( 0.004 ms): psq/11143 madvise(arg0: 140038575144960, arg1: 65536, arg2: 4, arg3: -4194304, arg4: 140038574702592, arg5: 140731978474016) = 0
    12.166 ( 0.002 ms): psq/11143 open(arg0: 140038575052128, arg1: 524288, arg2: 438, arg3: 140731978478280, arg4: 0, arg5: 0) = 4
    12.168 ( 0.001 ms): psq/11143 ioctl(arg0: 4, arg1: 21585, arg2: 438, arg3: 140731978478280, arg4: 0, arg5: 0) = 0
    12.170 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x12e] => //anon@0x7f5d4586c000 (d.)
    12.171 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586d000 (d.)
    12.173 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586e000 (d.)
    12.174 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586f000 (d.)
    12.176 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45870000 (d.)
    12.177 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45871000 (d.)
    12.179 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45872000 (d.)
    12.180 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45873000 (d.)
    12.182 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45874000 (d.)
    12.183 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45875000 (d.)
    12.185 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45876000 (d.)
    12.186 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45877000 (d.)
    12.188 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45878000 (d.)
    12.189 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45879000 (d.)
    12.191 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4587a000 (d.)
    12.193 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4587b000 (d.)
    12.200 ( 0.007 ms): psq/11143 read(arg0: 4, arg1: 140038575144960, arg2: 65536, arg3: 8, arg4: 2097865012304223517, arg5: 768) = 612
    12.212 ( 0.001 ms): psq/11143 read(arg0: 4, arg1: 140038575144960, arg2: 65536, arg3: 6, arg4: 26, arg5: -7900185713307138509) = 0
    12.214 ( 0.001 ms): psq/11143 close(arg0: 4, arg1: 0, arg2: 0, arg3: 6, arg4: 2097865012304223517, arg5: -7900185713307138509) = 0
    12.219 ( 0.003 ms): psq/11143 madvise(arg0: 140038575144960, arg1: 65536, arg2: 4, arg3: -4194304, arg4: 140038574702592, arg5: 140731978467840) = 0
    12.226 ( 0.003 ms): psq/11143 open(arg0: 140038575052256, arg1: 524288, arg2: 438, arg3: 140731978477464, arg4: 0, arg5: 0) = 4
    12.228 ( 0.001 ms): psq/11143 ioctl(arg0: 4, arg1: 21585, arg2: 438, arg3: 140731978477464, arg4: 0, arg5: 0) = 0
    12.229 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x12e] => //anon@0x7f5d4586c000 (d.)
    12.231 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586d000 (d.)
    12.232 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586e000 (d.)
    12.234 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586f000 (d.)
    12.235 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45870000 (d.)
    12.237 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45871000 (d.)
    12.238 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45872000 (d.)
    12.240 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45873000 (d.)
    12.242 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45874000 (d.)
    12.243 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45875000 (d.)
    12.245 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45876000 (d.)
    12.246 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45877000 (d.)
    12.248 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45878000 (d.)
    12.249 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45879000 (d.)
    12.251 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4587a000 (d.)
    12.253 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4587b000 (d.)
    12.255 ( 0.001 ms): psq/11143 read(arg0: 4, arg1: 140038575144960, arg2: 65536, arg3: 140038578897296, arg4: 0, arg5: 7976) = 0
    12.256 ( 0.001 ms): psq/11143 close(arg0: 4, arg1: 65536, arg2: 0, arg3: 140038578897296, arg4: 0, arg5: 7976) = 0
    12.261 ( 0.003 ms): psq/11143 madvise(arg0: 140038575144960, arg1: 65536, arg2: 4, arg3: -4194304, arg4: 140038574702592, arg5: 140731978474016) = 0
    12.267 ( 0.003 ms): psq/11143 open(arg0: 140038574969424, arg1: 524288, arg2: 438, arg3: 0, arg4: 0, arg5: 0) = 4
    12.269 ( 0.001 ms): psq/11143 ioctl(arg0: 4, arg1: 21585, arg2: 438, arg3: 0, arg4: 0, arg5: 0      ) = 0
    12.270 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x12e] => //anon@0x7f5d4586c000 (d.)
    12.272 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586d000 (d.)
    12.273 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586e000 (d.)
    12.275 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4586f000 (d.)
    12.276 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45870000 (d.)
    12.278 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45871000 (d.)
    12.279 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45872000 (d.)
    12.281 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45873000 (d.)
    12.282 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45874000 (d.)
    12.284 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45875000 (d.)
    12.285 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45876000 (d.)
    12.287 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45877000 (d.)
    12.288 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45878000 (d.)
    12.290 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d45879000 (d.)
    12.292 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4587a000 (d.)
    12.293 ( 0.000 ms): psq/11143 minfault [/lib64/libc-2.22.so@__memset_avx2+0x188] => //anon@0x7f5d4587b000 (d.)
@keeperofdakeys keeperofdakeys changed the title MADV_DONTNEED MADV_DONTNEED causin logs of minor page faults Feb 25, 2016
@keeperofdakeys keeperofdakeys changed the title MADV_DONTNEED causin logs of minor page faults MADV_DONTNEED causing lots of minor page faults Feb 25, 2016
@ghost
Copy link

ghost commented Feb 25, 2016

It is the jemalloc that calls madvise. To avoid it you can switch to system
allocator:

#![feature(alloc_system)]
extern crate alloc_system;

The root cause, are costly allocations (deallocations) coming from BufReader.
By default it allocates a buffer with size of 64 * 1024 bytes which exceeds
allocation size that is cached in jemalloc in default configuration (32 KiB).
Thus another potential solution is to use smaller buffer with
BufReader::with_capacity(4096, file) instead of BufReader::new(file).

Though, in my opinion the default buffer size used by BufReader is quite
excessive and it probably would make sense to revise it downwards.

@keeperofdakeys
Copy link
Contributor Author

Thanks a lot, using the system allocator does resolve the problem. Limiting the size of BufReader also sounds like a good idea, especially for my case, where most of the files I'm reading are quite small.

Other measurements by perf suggest that nearly 50% of the running time was just page faults. If there are many other Rust programs out there doing such frequent big allocations, they may see similar hits. Is this jemalloc behaviour optimal?

@sfackler
Copy link
Member

There used to be a comment next to the default buffer size constant but it seems to have been lost at some point: https://github.com/rust-lang/rust/pull/9091/files#diff-b131eeef531ad098b32f49695a031008R60. The rationale was never that great when I added it, and I'd be fine dropping it to 4k or 8k which I think is somewhat more standard?

@sfackler
Copy link
Member

It's also worth noting that there's no reason to use BufReader at all in a context like this: https://github.com/keeperofdakeys/Process-Query/blob/master/src/procrs/pid/mod.rs#L51, since read_to_end will be making large read calls. BufReader is designed for use cases where you're reading a couple bytes at a time and don't want to pay for a syscall per 4 bytes.

@keeperofdakeys
Copy link
Contributor Author

I'd be happy to make a PR to change the default buffer size then. Should a warning also be added to the API docs regarding jemalloc and large buffer sizes?

As for my code, using read_to_end directly on the File is causing reads of a single byte. I just did some measurements, and this seems to have a detrimental impact.

No BufReader, 50ms, ~280000 syscalls (lots of read(1) calls)
BufReader::new, 30ms, ~5300 syscalls (some madvise syscalls)
BufReader::with_capacity(4096), 14ms ~4500 syscalls

So at least for my case, a properly configured BufReader makes a huge difference.

@keeperofdakeys
Copy link
Contributor Author

I noticed a PR to update jemalloc (#31889), and I think this will explain why the beta version did not show the problem. Nightly and stable are using jemalloc 3, the beta is still using jemalloc 4. If jemalloc 4 handles big buffers better (not calling madvise a lot), would it still be worth changing the default buffer size?

@sfackler
Copy link
Member

I'm not sure where read requests with a 1 byte buffer could be coming from - this is the implementation of read_to_end: https://github.com/rust-lang/rust/blob/master/src/libstd/io/mod.rs#L340-L375. It'll start with 16 byte reads and then ramp up exponentially to 64k reads.

@keeperofdakeys
Copy link
Contributor Author

I am using the .bytes iterator in one place, I might have mistook one file being read with read(1) for all files being read with it. That's also why I started using BufReader originally.

@sfackler
Copy link
Member

Ah yeah, .bytes() will read one byte at a time.

sfackler added a commit to sfackler/rust that referenced this issue Apr 3, 2016
The 64k capacity was picked by me a couple of years ago in the initial
implementation of buffered IO adaptors:
https://github.com/rust-lang/rust/pull/9091/files#diff-b131eeef531ad098b32f49695a031008R62.
64K was picked for symmetry with libuv, which we no longer use.

64K is *way* larger than the default size of any other language that I
can find. C, C++, and Java default to 8K, and Go defaults to 4K. There
have been a variety of issues filed relating to this such as rust-lang#31885.

Closes rust-lang#31885
Manishearth added a commit to Manishearth/rust that referenced this issue Apr 8, 2016
…ichton

 The 64k capacity was picked by me a couple of years ago in the initial
implementation of buffered IO adaptors:
https://github.com/rust-lang/rust/pull/9091/files#diff-b131eeef531ad098b32f49695a031008R62.
64K was picked for symmetry with libuv, which we no longer use.

64K is *way* larger than the default size of any other language that I
can find. C, C++, and Java default to 8K, and Go defaults to 4K. There
have been a variety of issues filed relating to this such as rust-lang#31885.

Closes rust-lang#31885
bors added a commit that referenced this issue Apr 8, 2016
Drop the default buffer size to 8K

The 64k capacity was picked by me a couple of years ago in the initial
implementation of buffered IO adaptors:
https://github.com/rust-lang/rust/pull/9091/files#diff-b131eeef531ad098b32f49695a031008R62.
64K was picked for symmetry with libuv, which we no longer use.

64K is *way* larger than the default size of any other language that I
can find. C, C++, and Java default to 8K, and Go defaults to 4K. There
have been a variety of issues filed relating to this such as #31885.

Closes #31885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants