Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Read::size_hint and pre-allocate in read_to_end #45928

Closed
wants to merge 6 commits into from

Conversation

SimonSapin
Copy link
Contributor

Many FromIterator implementations rely on Iterator::size_hint to pre-allocate memory. This is a prototype of what something equivalent for std::io::Read could look like. An alternative would be to not add any public API but use an private specialization trait, like ZipImpl in libcore.

This came up in #45837 in the case of reading from a file. I’ve measured File::read_to_end with a vector created with Vec::with_capacity(file.metadata().len() as usize), compared to Vec::new(). On my linux desktop with an SSD (though everything is probably in filesystem cache here), pre-allocating + reading take 3% more time to 43% less time depending on file size.

     Running target/release/deps/read_to_end_bench-9122fdf004b0166c

running 24 tests
test a_read_128::vec_new            ... bench:       1,380 ns/iter (+/- 49) = 92 MB/s
test a_read_128::vec_with_capacity  ... bench:       1,416 ns/iter (+/- 107) = 90 MB/s
test b_read_512::vec_new            ... bench:       1,690 ns/iter (+/- 78) = 302 MB/s
test b_read_512::vec_with_capacity  ... bench:       1,736 ns/iter (+/- 52) = 294 MB/s
test c_read_2k::vec_new             ... bench:       2,085 ns/iter (+/- 113) = 982 MB/s
test c_read_2k::vec_with_capacity   ... bench:       2,015 ns/iter (+/- 72) = 1016 MB/s
test d_read_8k::vec_new             ... bench:       2,778 ns/iter (+/- 131) = 2948 MB/s
test d_read_8k::vec_with_capacity   ... bench:       2,516 ns/iter (+/- 89) = 3255 MB/s
test e_read_32k::vec_new            ... bench:       5,107 ns/iter (+/- 103) = 6416 MB/s
test e_read_32k::vec_with_capacity  ... bench:       4,404 ns/iter (+/- 81) = 7440 MB/s
test f_read_128k::vec_new           ... bench:      16,232 ns/iter (+/- 106) = 8074 MB/s
test f_read_128k::vec_with_capacity ... bench:      12,223 ns/iter (+/- 103) = 10723 MB/s
test g_read_512k::vec_new           ... bench:      39,179 ns/iter (+/- 210) = 13381 MB/s
test g_read_512k::vec_with_capacity ... bench:      31,704 ns/iter (+/- 98) = 16536 MB/s
test h_read_1m::vec_new             ... bench:     463,119 ns/iter (+/- 2,354) = 2264 MB/s
test h_read_1m::vec_with_capacity   ... bench:     251,983 ns/iter (+/- 3,072) = 4161 MB/s
test i_read_2m::vec_new             ... bench:     668,742 ns/iter (+/- 8,317) = 3135 MB/s
test i_read_2m::vec_with_capacity   ... bench:     383,879 ns/iter (+/- 1,269) = 5463 MB/s
test j_read_4m::vec_new             ... bench:   1,188,553 ns/iter (+/- 13,409) = 3528 MB/s
test j_read_4m::vec_with_capacity   ... bench:     857,714 ns/iter (+/- 8,254) = 4890 MB/s
test k_read_8m::vec_new             ... bench:   2,302,201 ns/iter (+/- 15,746) = 3643 MB/s
test k_read_8m::vec_with_capacity   ... bench:   1,947,089 ns/iter (+/- 10,942) = 4308 MB/s
test l_read_32m::vec_new            ... bench:   8,990,230 ns/iter (+/- 18,718) = 3732 MB/s
test l_read_32m::vec_with_capacity  ... bench:   8,648,669 ns/iter (+/- 23,157) = 3879 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 24 measured; 0 filtered out
#![feature(test)]
extern crate test;
extern crate tempdir;

use std::fs::File;
use std::io::{Write, Read};
use test::Bencher;

fn run<F>(bencher: &mut Bencher, size: usize, new_buffer: F)
    where F: Fn(&File) -> Vec<u8>
{
    let dir = tempdir::TempDir::new("bench").unwrap();
    let path = dir.path().join("something");
    File::create(&path).unwrap().write_all(&vec![42; size]).unwrap();
    bencher.bytes = size as u64;
    bencher.iter(|| {
        let mut file = File::open(&path).unwrap();
        let mut buffer = new_buffer(&file);
        file.read_to_end(&mut buffer).unwrap();
    })
}

macro_rules! sizes {
    ($( $name: ident  $size: expr )+) => {
        $(
            mod $name {
                use super::*;

                #[bench]
                fn vec_new(bencher: &mut Bencher) {
                    run(bencher, $size, |_| Vec::new())
                }

                #[bench]
                fn vec_with_capacity(bencher: &mut Bencher) {
                    run(bencher, $size, |file| {
                        Vec::with_capacity(file.metadata().unwrap().len() as usize)
                    })
                }
            }
        )+
    }
}

sizes! {
    a_read_128 128
    b_read_512 512
    c_read_2k 2 * 1024
    d_read_8k 8 * 1024
    e_read_32k 32 * 1024
    f_read_128k 128 * 1024
    g_read_512k 512 * 1024
    h_read_1m 1024 * 1024
    i_read_2m 2 * 1024 * 1024
    j_read_4m 4 * 1024 * 1024
    k_read_8m 8 * 1024 * 1024
    l_read_32m 32 * 1024 * 1024
}

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 11, 2017
@bluss
Copy link
Member

bluss commented Nov 11, 2017

If this adds a new system call when calling read_to_end on a File, it has been discussed before. Putting a system call in size_hint (metadata) is not ideal. (I can't find it though.)

@SimonSapin
Copy link
Contributor Author

Yes, File::size_hint as implemented in the current PR does make a system call. I agree this is not ideal. I expect this PR won’t be merged as-is, I submitted it to have some starting point to discuss what if anything should be done here.

src/libstd/fs.rs Outdated
@@ -449,6 +449,13 @@ impl Read for File {
self.inner.read(buf)
}

fn size_hint(&self) -> usize {
match self.metadata() {
Ok(meta) => meta.len() as usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor point but this should probably use a saturating cast not as.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an API for doing saturating integer casts? Writing #[cfg(target_pointer_size = …)] code here seems tedious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly not yet (rust-lang/rfcs#1218). Something like cmp::min(meta.len(), usize::max_value() as u64) as usize should work though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand this is a hint, the default impl returns zero without necessarily being at EOF so underestimating should be OK. And if you’re reading a file larger than your address space, you’re gonna have a hard time in read_to_end anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to check its current position and subtract that off, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. However the obvious fix doesn’t work because <File as Seek>::seek takes &mut self, even with SeekFrom::Current(0). I wonder if there should be some API like File::position(&self) -> io::Result<u64>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, <std::fs::File as Seek>::seek calls std::sys::fs::File::seek which takes &self, so size_hint can call the latter directly.

@mbrubeck
Copy link
Contributor

mbrubeck commented Nov 13, 2017

read_to_end still reads at most 8KB at a time, even if the entire buffer is allocated up-front. The pre-allocated case could be optimized further by reading up to the available capacity in a single read syscall.

Update: This was fixed by #46050.

@shepmaster
Copy link
Member

random-assigning to...

r? @sfackler

@sfackler
Copy link
Member

Does #46050 give the same improvement?

@sunfishcode
Copy link
Member

A non-obvious detail of read_to_end's loop is that you always have to allocate at least one byte more than the file size, because it reads until read returns 0, and that last read needs to be passed some space that it won't use.

If you're calling metadata(), when it provides a size, can you trust the size (and not take it as a hint)? If so, in that case you could add alternate logic in read_to_end that just reads until it gets that many bytes. That would avoid the need to allocate extra space, and the extra read syscall at the end. In the common case you could do one fstat and one read, rather than two reads.

@mbrubeck
Copy link
Contributor

You can't trust the size from metadata because the file may change between the metadata call and the read call.

@sunfishcode
Copy link
Member

You always have to be prepared to get fewer bytes than expected, or errors, but if there are more bytes than expected, it's no different from someone appending bytes after your last read returned 0. You'll miss the newly appended bytes, but it was a race anyway.

That assumes that metadata() isn't memoized, but it doesn't currently appear to be.

@bors
Copy link
Contributor

bors commented Nov 21, 2017

☔ The latest upstream changes (presumably #46166) made this pull request unmergeable. Please resolve the merge conflicts.

@carols10cents
Copy link
Member

What's the status of this @SimonSapin ? I'm having a hard time telling :)

@SimonSapin
Copy link
Contributor Author

Does #46050 give the same improvement?

I’ve run the benchmark again in rustc 1.24.0-nightly (560a5da 2017-11-27), which includes #46050. Previous results were “3% more time to 43% less time” Now the time with Vec::with_capacity is 4% to 48% less time than Vec::new.

So this PR’s improvement seems independent of #46050’s improvement.

More important IMO is whether Read::size_hint is the kind of public API we want to stabilize eventually.

test a_read_128::vec_new            ... bench:       1,362 ns/iter (+/- 200) = 93 MB/s
test a_read_128::vec_with_capacity  ... bench:       1,158 ns/iter (+/- 199) = 110 MB/s
test b_read_512::vec_new            ... bench:       1,705 ns/iter (+/- 203) = 300 MB/s
test b_read_512::vec_with_capacity  ... bench:       1,172 ns/iter (+/- 301) = 436 MB/s
test c_read_2k::vec_new             ... bench:       2,102 ns/iter (+/- 188) = 974 MB/s
test c_read_2k::vec_with_capacity   ... bench:       1,193 ns/iter (+/- 327) = 1716 MB/s
test d_read_8k::vec_new             ... bench:       2,759 ns/iter (+/- 271) = 2969 MB/s
test d_read_8k::vec_with_capacity   ... bench:       1,412 ns/iter (+/- 187) = 5801 MB/s
test e_read_32k::vec_new            ... bench:       4,957 ns/iter (+/- 93) = 6610 MB/s
test e_read_32k::vec_with_capacity  ... bench:       2,878 ns/iter (+/- 205) = 11385 MB/s
test f_read_128k::vec_new           ... bench:      14,690 ns/iter (+/- 135) = 8922 MB/s
test f_read_128k::vec_with_capacity ... bench:       9,101 ns/iter (+/- 160) = 14401 MB/s
test g_read_512k::vec_new           ... bench:      30,788 ns/iter (+/- 284) = 17028 MB/s
test g_read_512k::vec_with_capacity ... bench:      21,640 ns/iter (+/- 42) = 24227 MB/s
test h_read_1m::vec_new             ... bench:     444,013 ns/iter (+/- 3,361) = 2361 MB/s
test h_read_1m::vec_with_capacity   ... bench:     232,138 ns/iter (+/- 2,474) = 4517 MB/s
test i_read_2m::vec_new             ... bench:     633,958 ns/iter (+/- 3,849) = 3308 MB/s
test i_read_2m::vec_with_capacity   ... bench:     343,065 ns/iter (+/- 813) = 6112 MB/s
test j_read_4m::vec_new             ... bench:   1,114,472 ns/iter (+/- 12,811) = 3763 MB/s
test j_read_4m::vec_with_capacity   ... bench:     773,090 ns/iter (+/- 2,065) = 5425 MB/s
test k_read_8m::vec_new             ... bench:   2,156,756 ns/iter (+/- 7,510) = 3889 MB/s
test k_read_8m::vec_with_capacity   ... bench:   1,792,980 ns/iter (+/- 5,539) = 4678 MB/s
test l_read_32m::vec_new            ... bench:   8,499,907 ns/iter (+/- 24,512) = 3947 MB/s
test l_read_32m::vec_with_capacity  ... bench:   8,129,977 ns/iter (+/- 22,403) = 4127 MB/s

@sunfishcode sunfishcode mentioned this pull request Nov 28, 2017
@sunfishcode
Copy link
Member

Reserving size_hint bytes sets up read_to_end to reallocate in the common case, because it needs extra space for the final read to detect EOF. This can create 2x or more slowdowns in some cases and the resulting buffers are twice as large as they need to be due to Vec's doubling behavior.

This can be fixed either by reserving size_hint + 1 bytes rather than size_hint bytes, or by #46340.

This one byte of extra capacity is necessary for the final
zero-size read that indicates EOF.
@SimonSapin
Copy link
Contributor Author

Good point. I’ve added the + 1. As mbrubeck mentioned above, the approach in #46340 is racy since the file could grow in the meantime.

@sunfishcode
Copy link
Member

To be sure, read_to_end is already racy; if someone appends to a file while read_to_end is reading it, read_to_end may or may not see the appended bytes, because if it gets to EOF before the write happens, it stops reading.

@sfackler
Copy link
Member

Cool - something like this seems worthwhile then. Should it return an io::Result<usize>? You can always return 0 if something went wrong, but it seems like it's nice to not throw away those errors. I kind of doubt that reads would succeed if you can't stat the file anymore.

@SimonSapin
Copy link
Contributor Author

@sfackler Good point. Changed to io::Result<usize>.

@SimonSapin
Copy link
Contributor Author

I don’t have a strong opinion on the strategy of this PR v.s. that of #46340. A key difference is the semantics of size_hint / size_snapshot in the general Read case, not just File: size_hint is intended to be an estimate and defaults to zero (like the first component of Iterator::size_hint), size_snapshot is optional and when Some(_) is assumed to be exact.

@bors
Copy link
Contributor

bors commented Dec 4, 2017

☔ The latest upstream changes (presumably #46485) made this pull request unmergeable. Please resolve the merge conflicts.

@sfackler
Copy link
Member

sfackler commented Dec 4, 2017

Sorry for the delay here - should we maybe make an issue to figure out the approach to take? I'd like to avoid splitting the discussion across two PRs.

@shepmaster
Copy link
Member

I'm going to make an executive decision and mark this and #46340 as waiting on a team decision as it seems that they cannot both exist together. Let me know if you disagree.

@shepmaster shepmaster added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 9, 2017
@sfackler
Copy link
Member

The @rust-lang/libs team talked about this and #46340 during our triage meeting and decided that we're not looking to expand the surface area of Read in this way just yet. Instead, we can modify the implementation of fs::read to do a length lookup which should cover 99% of the use cases for this kind of thing.

@sfackler sfackler closed this Jan 10, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 12, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants