Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-allocate in fs::read and fs::read_string #47324

Merged
merged 1 commit into from
Jan 12, 2018
Merged

Conversation

mbrubeck
Copy link
Contributor

This is a simpler alternative to #46340 and #45928, as requested by the libs team.

src/libstd/fs.rs Outdated
// Allocate one extra byte so the buffer doesn't need to grow before the final `read` call at
// the end of the file. Don't worry about `usize` overflow because this will fail anyway if
// the file doesn't fit into memory.
let mut bytes = Vec::with_capacity(metadata(&path)?.len() as usize + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there contexts/weird files where the metadata lookup won't work but the file read will?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally certain. I based this on #45928 (comment):

You can always return 0 if something went wrong, but it seems like it's nice to not throw away those errors. I kind of doubt that reads would succeed if you can't stat the file anymore.

but I do think it would be reasonable to just start with an empty buffer if the stat call fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that starting with an empty buffer if the stat call fails is probably the right thing to do. It's potentially worth pointing out that it seems like this may not be faster (#35844 (comment)) based on past benchmarks so maybe we shouldn't do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those numbers may no longer be accurate - we no longer zero the buffer before reading into it for Files, and read_to_end has been tweaked a bit to behave differently. #45928 has some benchmarks.

@mbrubeck
Copy link
Contributor Author

Here are before/after benchmark results from my Linux x86_64 laptop with SSD with a hot cache. The alloc versions are using the new code from this PR.

test read_1kb        ... bench:       2,487 ns/iter (+/- 473)
test read_1kb_alloc  ... bench:       1,799 ns/iter (+/- 641)
test read_32kb       ... bench:       6,454 ns/iter (+/- 978)
test read_32kb_alloc ... bench:       3,171 ns/iter (+/- 438)
test read_1mb        ... bench:     431,048 ns/iter (+/- 20,617)
test read_1mb_alloc  ... bench:      55,331 ns/iter (+/- 7,460)
test read_32mb       ... bench:   7,643,615 ns/iter (+/- 161,601)
test read_32mb_alloc ... bench:   7,104,146 ns/iter (+/- 237,000)

@sfackler
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jan 10, 2018

📌 Commit 1745f43 has been approved by sfackler

@Mark-Simulacrum
Copy link
Member

@bors r=sfackler

@bors
Copy link
Contributor

bors commented Jan 10, 2018

📌 Commit 44912bf has been approved by sfackler

@mbrubeck
Copy link
Contributor Author

Updated to fall back to an empty buffer if metadata returns an error.

@sfackler
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jan 10, 2018

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Jan 10, 2018

📌 Commit 44912bf has been approved by sfackler

fn initial_buffer_size<P: AsRef<Path>>(path: P) -> usize {
// Allocate one extra byte so the buffer doesn't need to grow before the
// final `read` call at the end of the file. Don't worry about `usize`
// overflow because reading will fail regardless in that case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "fail regardless" here mean Result::Err or panic?
m.len() as usize + 1 can panic.

Copy link
Contributor Author

@mbrubeck mbrubeck Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will panic in RawVec code when requested capacity exceeds isize::MAX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or more likely, abort due to out-of-memory sometime before that)

kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Use the new fs_read_write functions in rustc internals

Uses `fs::read` and `fs::write` (added by rust-lang#45837) where appropriate, to simplify code and dog-food these new APIs.  This also improves performance, when combined with rust-lang#47324.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Use the new fs_read_write functions in rustc internals

Uses `fs::read` and `fs::write` (added by rust-lang#45837) where appropriate, to simplify code and dog-food these new APIs.  This also improves performance, when combined with rust-lang#47324.
bors added a commit that referenced this pull request Jan 11, 2018
Rollup of 12 pull requests

- Successful merges: #47283, #47288, #47289, #47298, #47305, #47307, #47310, #47322, #47324, #47328, #47340, #47344
- Failed merges:
bors added a commit that referenced this pull request Jan 12, 2018
Rollup of 12 pull requests

- Successful merges: #47283, #47288, #47289, #47298, #47305, #47307, #47310, #47322, #47324, #47328, #47340, #47344
- Failed merges:
kennytm added a commit to kennytm/rust that referenced this pull request Jan 12, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 12, 2018
Use the new fs_read_write functions in rustc internals

Uses `fs::read` and `fs::write` (added by rust-lang#45837) where appropriate, to simplify code and dog-food these new APIs.  This also improves performance, when combined with rust-lang#47324.
bors added a commit that referenced this pull request Jan 12, 2018
@bors bors merged commit 44912bf into rust-lang:master Jan 12, 2018
@alex
Copy link
Member

alex commented Jan 17, 2018

Should this use a sequence of calls to do open -> fstat instead of stat -> open? In contrived local measurements on macOS, the fstat version is faster (presumably because it avoids kernel string processing and needing to trace more FS metadata).

Should I file a bug to track this?

@SimonSapin
Copy link
Contributor

@alex That sounds like a good idea. (This can use the metadata method of fs::File, which on Unix is implemented with fstat64.) Please file it separately, since this PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants