reverse_complement spends most of its time reading input #43

mbrubeck · 2017-02-24T21:42:03Z

On my system, reverse_complement < input25000000.txt spends 67% of its time on the read_to_end call here. According to the profile, that time is almost all spent in memmove.

    let mut data = Vec::with_capacity(1024 * 1024);
    stdin.lock().read_to_end(&mut data).unwrap();

Increasing the initial buffer size to fit the entire dataset cuts the read_to_end time in half, and reduces the total execution time by about 30%, but obviously it also wastes memory if the data is small. (Since this program reads from stdin, it doesn't known the total size before it starts reading.)

The text was updated successfully, but these errors were encountered:

mbrubeck · 2017-02-24T21:47:12Z

The C++ code uses ftell to get the input size before allocating a buffer:

      long start = ftell( stdin );
      fseek( stdin, 0, SEEK_END );
      size = ftell( stdin ) - start;

Fixes TeXitoi#43.

Fixes #43.

llogiq · 2017-02-24T22:49:38Z

The C version uses a buffer that is realloc'd to double size on overflow. Perhaps we could have something similar in a crate? Reading arbitrary-sized input is a common use case after all.

TeXitoi · 2017-02-24T22:52:14Z

@llogiq Vec already does that automatically, no?

mbrubeck · 2017-02-24T23:08:31Z

To avoid wasting memory, read_to_end does not double the buffer; instead it grows it by up to DEFAULT_BUF_SIZE (8 KB) at a time. When reading 250 MB of data, that's a lot of reallocations!

https://github.com/rust-lang/rust/blob/08230775a026c955873ba557e624b7f665661f37/src/libstd/io/mod.rs#L351-L354

Also, with a 250 MB buffer, even a single reallocation can be a huge cost. I had to tack an extra byte onto the buffer size to avoid a single reallocation at the end of the file.

llogiq · 2017-02-24T23:11:52Z

@TeXitoi also depending on alignment and allocator used, RawVec may create just another allocation and move the contents there (interesting realloc implementation)..

mbrubeck added a commit to mbrubeck/benchmarksgame-rs that referenced this issue Feb 24, 2017

reverse_complement: Pre-allocate a buffer based on input size

170cd1c

Fixes TeXitoi#43.

mbrubeck mentioned this issue Feb 24, 2017

reverse_complement: Pre-allocate a buffer based on input size #44

Merged

mbrubeck added a commit to mbrubeck/benchmarksgame-rs that referenced this issue Feb 24, 2017

reverse_complement: Pre-allocate a buffer based on input size

54b3e84

Fixes TeXitoi#43.

mbrubeck added a commit to mbrubeck/benchmarksgame-rs that referenced this issue Feb 24, 2017

reverse_complement: Pre-allocate a buffer based on input size

d7409fd

Fixes TeXitoi#43.

TeXitoi closed this as completed in #44 Feb 24, 2017

TeXitoi pushed a commit that referenced this issue Feb 24, 2017

reverse_complement: Pre-allocate a buffer based on input size (#44)

e1681f9

Fixes #43.

mbrubeck mentioned this issue Nov 7, 2017

Add read, read_string, and write functions to std::fs rust-lang/rust#45837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverse_complement spends most of its time reading input #43

reverse_complement spends most of its time reading input #43

mbrubeck commented Feb 24, 2017 •

edited

Loading

mbrubeck commented Feb 24, 2017

llogiq commented Feb 24, 2017

TeXitoi commented Feb 24, 2017

mbrubeck commented Feb 24, 2017 •

edited

Loading

llogiq commented Feb 24, 2017

reverse_complement spends most of its time reading input #43

reverse_complement spends most of its time reading input #43

Comments

mbrubeck commented Feb 24, 2017 • edited Loading

mbrubeck commented Feb 24, 2017

llogiq commented Feb 24, 2017

TeXitoi commented Feb 24, 2017

mbrubeck commented Feb 24, 2017 • edited Loading

llogiq commented Feb 24, 2017

mbrubeck commented Feb 24, 2017 •

edited

Loading

mbrubeck commented Feb 24, 2017 •

edited

Loading