Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle out of memory errors in io:Read::read_to_end() #117925

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

kornelski
Copy link
Contributor

#116570 got stuck due to a procedural confusion. Retrying so that it can get FCP with the proper team now. cc @joshtriplett @BurntSushi


I'd like to propose handling of out-of-memory errors in the default implementation of io::Read::read_to_end() and fs::read(). These methods create/grow a Vec with a size that is external to the program, and could be arbitrarily large.

Due to being I/O methods, they can already fail in a variety of ways, in theory even including ENOMEM from the OS too, so another failure case should not surprise anyone.

While this may not help much Linux with overcommit, it's useful for other platforms like WASM. Internals thread.

I've added documentation that makes it explicit that the OOM handling is a nice-to-have, and not a guarantee of the trait.

I haven't changed the implementation of impl Read for &[u8] and VecDeque out of caution, because in these cases users could assume read can't fail.

This code uses try_reserve() + extend_from_slice() which is optimized since #117503.

@rustbot
Copy link
Collaborator

rustbot commented Nov 15, 2023

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 15, 2023
@bors
Copy link
Contributor

bors commented Nov 29, 2023

☔ The latest upstream changes (presumably #118412) made this pull request unmergeable. Please resolve the merge conflicts.

@kornelski kornelski force-pushed the read-to-oom branch 2 times, most recently from ad4f209 to fdc51b6 Compare November 29, 2023 19:17
@rust-log-analyzer

This comment has been minimized.

@kornelski
Copy link
Contributor Author

@rustbot r?

@rustbot
Copy link
Collaborator

rustbot commented Dec 23, 2023

Error: Parsing assign command in comment failed: ...'' | error: specify user to assign to at >| ''...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@fintelia
Copy link
Contributor

Perhaps io::Repeat should also return a memory error? Right now it unconditionally OOMs...

@kornelski
Copy link
Contributor Author

@fintelia I've made a separate PR, because I think it may be bikesheddable which error kind it should return.

@Mark-Simulacrum Mark-Simulacrum added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 20, 2024
@m-ou-se m-ou-se assigned Amanieu and unassigned thomcc Jan 23, 2024
@m-ou-se m-ou-se added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jan 23, 2024
@Amanieu
Copy link
Member

Amanieu commented Jan 23, 2024

We discussed this in the libs-api meeting and we are happy with this change. It doesn't need an FCP since it isn't an API breaking change.

Some concerns on the implementation:

  • [u8]::read_to_end is missing a try_reserve.
  • Same with VecDeque:read_to_end.
    These are both located in library/std/src/io/impls.rs.

kornelski added a commit to kornelski/rust that referenced this pull request Jan 27, 2024
@kornelski
Copy link
Contributor Author

@Amanieu I've added the impls for [u8] and VecDeque, and rebased.

@the8472
Copy link
Member

the8472 commented Jan 29, 2024

We could use _exact if a) the capacity is zero b) the Read size is known. If it's used in a loop it'd result in at most one extra resize.

Or we could document that the caller should presize the Vec if they want more control over the allocation.
Actually, if one needs that much control then read_to_end is not the right tool.

@fintelia
Copy link
Contributor

Vec::try_reserve already has smart logic for selecting the allocation size based on the current capacity and requested size. I don't think there's anything special about the read_to_end case that is going to allow for better optimization.

I also disagree about read_to_end not being a good tool if you need precise control. A call to r.take(n).read_to_end(&mut v) currently writes into the initialized portion of a Vec and then if the extra capacity was less than n, successively doubles the size until the requested size is reached or the the end of the input is hit. (At the moment the take call prevents a size hint from being provided so this case doesn't hit the code path we're talking about, but that could change.) I'm not sure how many people realize it, but this approach also lets you take advantage of Read::read_buf with safe code on stable Rust via an API that is quite frankly more ergonomic that dealing with a BorrowedCursor.

@the8472
Copy link
Member

the8472 commented Jan 29, 2024

I don't think there's anything special about the read_to_end case that is going to allow for better optimization.

I was thinking of the specific case where the size of the reader is known. But you're right, for an empty vec it already does the right thing. The only case where it could cause an issue is when the Vec already has been allocated but the allocated space is close but insufficient, in that case it could double the capacity when we could have done an exact allocation instead. That's a fairly narrow scenario.

@kornelski
Copy link
Contributor Author

I've added _exact to std::fs::read and File's impl, where the full size is known. io::Take reverts back to the default unhinted implementation, so there's no risk overallocating.

There's possibility of optimizing growth based on the hint — when the hinted size is between cap and cap*2, then that growth step could be using the hinted size. That's however separate from OOM I'm trying to add here, so can be done in a separate PR.

@rust-log-analyzer

This comment has been minimized.

@Amanieu
Copy link
Member

Amanieu commented Jan 30, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Jan 30, 2024

📌 Commit 60f4628 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2024
@bors
Copy link
Contributor

bors commented Jan 30, 2024

⌛ Testing commit 60f4628 with merge 5c9c3c7...

@bors
Copy link
Contributor

bors commented Jan 30, 2024

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 5c9c3c7 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 30, 2024
@bors bors merged commit 5c9c3c7 into rust-lang:master Jan 30, 2024
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 30, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5c9c3c7): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
6.2% [6.2%, 6.2%] 1
Improvements ✅
(primary)
-4.3% [-6.4%, -2.2%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.3% [-6.4%, -2.2%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.3%] 13
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 16
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.4%, 0.3%] 14

Bootstrap: 661.382s -> 662.576s (0.18%)
Artifact size: 308.09 MiB -> 308.05 MiB (-0.01%)

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 30, 2024
Reject infinitely-sized reads from io::Repeat

These calls would always run out of memory.

Related to rust-lang#117925
@kornelski kornelski deleted the read-to-oom branch January 30, 2024 11:32
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 30, 2024
Rollup merge of rust-lang#119991 - kornelski:endless-read, r=the8472

Reject infinitely-sized reads from io::Repeat

These calls would always run out of memory.

Related to rust-lang#117925
@fintelia
Copy link
Contributor

Wait, was it intentional to merge this with some read_to_end implementations that know their exact size using try_reserve and others using try_reserve_exact?

Prior to this PR they all used reserve.

@kornelski
Copy link
Contributor Author

kornelski commented Jan 31, 2024

I've ended up making the change to _exact in response to the discussion — #117925 (comment)

The default behavior remains to use try_reserve that may overshoot, because of patterns like take(1).read_to_end(append) in a loop that could end up resizing too much.

The _exact allocation is used in std::fs::read(_to_string) functions that return a new Vec. I think in these cases it's reasonable to assume that users are interested in reading and consuming the whole file, and users would rather expect these functions to be memory-efficient, than optimized for immediately appending data. File edits will likely go through parsing and serialization, and simple appends can be done more efficiently with OpenOptions::append.

exact_ is also in File::read_to_end specifically, when it's not wrapped in Take or something else. In that basic case I assume it's used similarly to std::fs::read, and if not, then the caller has an option of reserving more space themselves.

@fintelia
Copy link
Contributor

The _exact allocation is used in std::fs::read(_to_string) functions that return a new Vec. I think in these cases it's reasonable to assume that users are interested in reading and consuming the whole file, and users would rather expect these functions to be memory-efficient, than optimized for immediately appending data. File edits will likely go through parsing and serialization, and simple appends can be done more efficiently with OpenOptions::append.

This case is irrelevant because _reserve and _reserve_exact behave identically if the Vec starts out with capacity=0. (Except for an edge case of requesting fewer than 8-bytes and using a weird allocator that supports sub 8-byte allocations.)

exact_ is also in File::read_to_end specifically, when it's not wrapped in Take or something else. In that basic case I assume it's used similarly to std::fs::read, and if not, then the caller has an option of reserving more space themselves.

This adds a quirk to one specific implementation that I'm not sure is justified. As mentioned, _reserve doesn't overshoot if the Vec starts out empty. But _reserve also doesn't overshoot if the total capacity required is more than double the current capacity.

In order words, the only case that changed from this PR is when the read size requires an allocation between cap and cap*2. The code for File used to round up to cap*2 but now allocates the exact hinted size. Hitting this case at all is somewhat of an edge case, but if you do, there's a reasonable chance you may hit it many times and actually prefer the previous behavior.

kornelski added a commit to kornelski/rust that referenced this pull request Jan 31, 2024
@kornelski
Copy link
Contributor Author

@fintelia To me it makes sense that File's implementation would be specialized for its own type, with EOF being inherently a special case. But that's a detail orthogonal to OOM handling, so I've made a PR to restore previous behavior: #120538

oli-obk added a commit to oli-obk/rust that referenced this pull request Feb 15, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.