simplify shootout-reverse-complement.rs #18357

TeXitoi · 2014-10-26T21:59:48Z

Simpler, safer and shorter, in the same spirit of the current version, and the
same performances.

@mahkoh please review, I think I didn't change any performances related thing.

Simpler, safer and shorter, in the same spirit of the current version, and the same performances.

mahkoh · 2014-10-26T22:03:05Z

src/test/bench/shootout-reverse-complement.rs

    const CHUNK: uint = 64 * 1024;

-    let mut vec = Vec::with_capacity(1024 * 1024);
+    let mut vec = Vec::with_capacity(CHUNK);


This has to be 1MB for the code below to work.

I've rewritten this function to be as close as possible to Reader::read_to_end(). This value will just add exactly one more allocation, or there is something I didn't understand?

One more allocation can be very significant with jemalloc. But in the case 64k -> 1M it looks like it's insignificant.

64k -> 1M isn't a huge reallocation so the mremap issue isn't relevant.

mahkoh · 2014-10-26T22:14:26Z

What do your benchmarks say?

TeXitoi · 2014-10-27T07:48:42Z

I can't see noticiable difference in speed between the 2 (i.e. on several run, the speed interval is the same)

mahkoh · 2014-10-27T12:32:21Z

Performance looks good here too.

alexcrichton · 2014-10-27T15:12:17Z

Nice work @TeXitoi, thanks!

mahkoh · 2014-10-27T15:59:38Z

Have you compared the performance with the C and C++ solutions?

TeXitoi · 2014-10-27T16:10:51Z

Rust:

real    0m1.069s
user    0m0.760s
sys     0m0.592s

C#2:

real    0m1.185s
user    0m1.412s
sys     0m0.452s

C++#4:

real    0m1.252s
user    0m1.424s
sys     0m0.520s

TeXitoi · 2014-10-27T16:21:53Z

Rust with Reader::read_to_end():

real    0m1.176s
user    0m0.792s
sys     0m0.676s

mahkoh · 2014-10-27T16:24:32Z

You're not using Linux, are you?

TeXitoi · 2014-10-27T16:27:02Z

I am:

texitoi@vaio:~/dev/rust$ cat /proc/cpuinfo | grep 'model name'
model name  : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
model name  : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
model name  : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
model name  : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
texitoi@vaio:~/dev/rust$ uname -a
Linux vaio 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux
texitoi@vaio:~/dev/rust$

TeXitoi · 2014-10-27T16:30:01Z

compile with

rustc -C lto -C target-cpu=core2 --opt-level 3 shootout-reverse-complement.rs -o rc

mahkoh · 2014-10-27T16:31:01Z

How are you benchmarking and what input are you using?

TeXitoi · 2014-10-27T16:33:42Z

input is generated with /tmp/fasta 100000000 > /tmp/fasta-out.txt

bench with runing time /tmp/rc < /tmp/fasta-out.txt > /dev/null about 10 times and taking the best result.

mahkoh · 2014-10-27T16:35:44Z

Please use the 25M fasta output used in the official benchmark.

TeXitoi · 2014-10-27T16:43:22Z

Input: /tmp/fasta 25000000 > /tmp/fasta-out.txt

Rust in this PR:

real    0m0.207s
user    0m0.164s
sys     0m0.112s

Rust + Reader::read_to_end():

real    0m0.299s
user    0m0.184s
sys     0m0.184s

C#2:

real    0m0.297s
user    0m0.356s
sys     0m0.116s

C++#4:

real    0m0.323s
user    0m0.384s
sys     0m0.112s

thestinger · 2014-10-27T16:49:05Z

src/test/bench/shootout-reverse-complement.rs

 /// Reads all remaining bytes from the stream.
 fn read_to_end<R: Reader>(r: &mut R) -> IoResult<Vec<u8>> {
+    // FIXME: this method is a temporary workaround of a slowness


It's not a performance bug in jemalloc or Rust and it's not a performance bug in the Linux kernel. I've proposed a new feature for mremap in the Linux kernel (MREMAP_RETAIN) to allow jemalloc to take advantage of it but there is no guarantee of that landing. I don't think you should include a FIXME for something that's not a bug and has no guarantee of ever being fixed.

You're free to use mmap, mremap and munmap directly which would be faster than working around the fact that you're doing massive copies by using a very large growth multiple. It's not possible for jemalloc to do this because mremap because it causes virtual memory fragmentation by unmapping the source.

The glibc allocator doesn't attempt to eliminate virtual memory fragmentation so it's able to use mremap as it exists today. However, it's significantly slower than just using mmap, mremap and munmap directly anyway.

Then you suggest that I just remove the comment? using mremap directly will be linux only, no?

The performance of huge reallocations on Linux is already better than other platforms. If MREMAP_RETAIN does get accepted upstream, then huge reallocations will be blazing on on new Linux kernels with jemalloc but that won't impact the performance on other platforms.

If you want to have code that's special-cased for Linux then you can already do that by calling mmap, mremap and munmap. It will be faster than what you're doing here because it will eliminate the huge copies rather than just making them less frequent.

mahkoh · 2014-10-27T16:50:49Z

It looks like read_to_end is significantly faster on your machine than on mine. I'm seeing 0.36 vs 0.52 here.

TeXitoi · 2014-10-27T16:56:49Z

@mahkoh do you compile with -C lto -C target-cpu=core2 --opt-level 3? I suspect LTO is usefull with Reader.

mahkoh · 2014-10-27T17:43:24Z

lto gives me no significant improvements.

TeXitoi · 2014-10-28T21:15:10Z

@thestinger rephrased the comment. OK?

@mahkoh

…excrichton Simpler, safer and shorter, in the same spirit of the current version, and the same performances. @mahkoh please review, I think I didn't change any performances related thing.

simplify shootout-reverse-complement.rs

7c6a4cc

Simpler, safer and shorter, in the same spirit of the current version, and the same performances.

mahkoh reviewed Oct 26, 2014
View reviewed changes

thestinger reviewed Oct 27, 2014
View reviewed changes

rephrase some comments according to remarks in the PR

7017fb0

TeXitoi force-pushed the simplify-reverse-complement branch from bf16d62 to 7017fb0 Compare October 28, 2014 21:14

bors closed this Oct 30, 2014

bors merged commit 7017fb0 into rust-lang:master Oct 30, 2014

TeXitoi mentioned this pull request Oct 30, 2014

Implement / optimize the shootout benchmarks #18085

Closed

15 tasks

simplify shootout-reverse-complement.rs #18357

simplify shootout-reverse-complement.rs #18357

Uh oh!

Conversation

TeXitoi commented Oct 26, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mahkoh commented Oct 26, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

alexcrichton commented Oct 27, 2014

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 27, 2014

Uh oh!

mahkoh commented Oct 27, 2014

Uh oh!

TeXitoi commented Oct 28, 2014

Uh oh!

Uh oh!