-
Notifications
You must be signed in to change notification settings - Fork 13.3k
simplify shootout-reverse-complement.rs #18357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Simpler, safer and shorter, in the same spirit of the current version, and the same performances.
const CHUNK: uint = 64 * 1024; | ||
|
||
let mut vec = Vec::with_capacity(1024 * 1024); | ||
let mut vec = Vec::with_capacity(CHUNK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be 1MB for the code below to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten this function to be as close as possible to Reader::read_to_end()
. This value will just add exactly one more allocation, or there is something I didn't understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more allocation can be very significant with jemalloc. But in the case 64k -> 1M it looks like it's insignificant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64k -> 1M isn't a huge reallocation so the mremap issue isn't relevant.
What do your benchmarks say? |
I can't see noticiable difference in speed between the 2 (i.e. on several run, the speed interval is the same) |
Performance looks good here too. |
Nice work @TeXitoi, thanks! |
Have you compared the performance with the C and C++ solutions? |
Rust:
C#2:
C++#4:
|
Rust with
|
You're not using Linux, are you? |
I am:
|
compile with
|
How are you benchmarking and what input are you using? |
input is generated with bench with runing |
Please use the 25M fasta output used in the official benchmark. |
Input: Rust in this PR:
Rust +
C#2:
C++#4:
|
/// Reads all remaining bytes from the stream. | ||
fn read_to_end<R: Reader>(r: &mut R) -> IoResult<Vec<u8>> { | ||
// FIXME: this method is a temporary workaround of a slowness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a performance bug in jemalloc or Rust and it's not a performance bug in the Linux kernel. I've proposed a new feature for mremap
in the Linux kernel (MREMAP_RETAIN
) to allow jemalloc to take advantage of it but there is no guarantee of that landing. I don't think you should include a FIXME for something that's not a bug and has no guarantee of ever being fixed.
You're free to use mmap
, mremap
and munmap
directly which would be faster than working around the fact that you're doing massive copies by using a very large growth multiple. It's not possible for jemalloc to do this because mremap
because it causes virtual memory fragmentation by unmapping the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glibc allocator doesn't attempt to eliminate virtual memory fragmentation so it's able to use mremap
as it exists today. However, it's significantly slower than just using mmap
, mremap
and munmap
directly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you suggest that I just remove the comment? using mremap
directly will be linux only, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of huge reallocations on Linux is already better than other platforms. If MREMAP_RETAIN
does get accepted upstream, then huge reallocations will be blazing on on new Linux kernels with jemalloc but that won't impact the performance on other platforms.
If you want to have code that's special-cased for Linux then you can already do that by calling mmap
, mremap
and munmap
. It will be faster than what you're doing here because it will eliminate the huge copies rather than just making them less frequent.
It looks like read_to_end is significantly faster on your machine than on mine. I'm seeing 0.36 vs 0.52 here. |
@mahkoh do you compile with |
lto gives me no significant improvements. |
bf16d62
to
7017fb0
Compare
@thestinger rephrased the comment. OK? |
…excrichton Simpler, safer and shorter, in the same spirit of the current version, and the same performances. @mahkoh please review, I think I didn't change any performances related thing.
Simpler, safer and shorter, in the same spirit of the current version, and the
same performances.
@mahkoh please review, I think I didn't change any performances related thing.