Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Fail fast when compiling a source file larger than 4 GiB #132791

Merged
merged 1 commit into from
Nov 24, 2024

Conversation

tyilo
Copy link
Contributor

@tyilo tyilo commented Nov 8, 2024

Currently if you try to compile a file that is larger than 4 GiB, rustc will first read the whole into memory before failing.

If we can read the metadata of the file, we can fail before reading the file.

@rustbot
Copy link
Collaborator

rustbot commented Nov 8, 2024

r? @compiler-errors

rustbot has assigned @compiler-errors.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 8, 2024
@@ -115,6 +115,10 @@ impl FileLoader for RealFileLoader {
}

fn read_file(&self, path: &Path) -> io::Result<String> {
if path.metadata().is_ok_and(|metadata| metadata.len() > u32::MAX.into()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we report this as an io::Error::other rather than fataling here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this works and produces:

error: couldn't read /home/tyilo/tmp/big_source.rs: rustc does not support files larger than 4 GiB

error: aborting due to 1 previous error

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 8, 2024
@@ -115,6 +115,10 @@ impl FileLoader for RealFileLoader {
}

fn read_file(&self, path: &Path) -> io::Result<String> {
if path.metadata().is_ok_and(|metadata| metadata.len() > u32::MAX.into()) {
eprintln!("fatal error: rustc does not support files larger than 4GB");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eprintln!("fatal error: rustc does not support files larger than 4GB");
eprintln!("fatal error: rustc does not support text files larger than 4GiB");

rustc supports binary files (via include_bytes!) with no hard-coded upper limit beyond available memory, so the error message should specify only text files are affected. Also I think the right unit here is GiB (10243): I believe GB is usually 10003 (although it's sometimes used for 10243 making GB somewhat ambiguous).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc supports binary files (via include_bytes!) with no hard-coded upper limit beyond available memory, so the error message should specify only text files are affected.

Just testing this, it doesn't seem to be true.

Also I think the right unit here is GiB (1024^3): I believe GB is usually 1000^3 (although it's sometimes used for 1024^3 making GB somewhat ambiguous).

Changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just testing this, it doesn't seem to be true.

It is true only if the file is not valid UTF-8 (which is the case for most binary files). Due to this line of code, if the file is valid UTF-8 the full file is loaded into the source map and is subject to the 4 GiB limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok that's weird. Would it make sense to change that to use try_new_source_file instead and also fallback to a span for the empty string, if the included files is greater than 4 GiB, but valid UTF-8?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that wouldn't fully fix the problem, as it would make the behaviour of rustc inconsistent depending on the order source files are loaded, since the 4 GiB limit is cumulative. If two files, a.txt and b.txt, both exist and are both 3 GiB of valid UTF-8, and one is loaded with include_bytes!("a.txt") and the other with include_str!("b.txt"), then at the moment both compilation will fail regardless of which order they are loaded in as the total loaded source file size exceeds 4 GiB. If load_binary_file is changed to use try_new_source_file and fallback if it fails, then compilation will succeed only if include_bytes!("a.txt") is processed after include_str!("b.txt"): if the include_bytes! is processed first, then try_new_source_file will succeed and fill up 3 GiB of the source map, meaning that the later include_str! will fail as there isn't enough space in the source map remaining.

The ultimate solution for consistent behaviour here (the 4 GiB limit only applying to files loaded as text files) would be for binary files to be tracked for dep-info separately from regular source files, meaning that binary files are not loaded into the source map at all. This would require some extra handling for expand_include_str, however, as that is currently the only user of the span returned by load_binary_file (and would therefore want the file to be loaded into the source map, unlike expand_include_bytes). As this is a pre-existing issue, it's fine to leave this to a future PR.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 9, 2024
@clubby789
Copy link
Contributor

cc #132862, which could be fixed by making the check >= u32::MAX and adjusting the error accordingly

Copy link
Member

@compiler-errors compiler-errors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz also squash commit history

@@ -115,6 +115,12 @@ impl FileLoader for RealFileLoader {
}

fn read_file(&self, path: &Path) -> io::Result<String> {
if path.metadata().is_ok_and(|metadata| metadata.len() > SourceFile::MAX_FILE_SIZE.into()) {
return Err(io::Error::other(format!(
"rustc doest not support text files larger than {} bytes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the wording is inconsistent between here and the other errors.

@compiler-errors
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 21, 2024
@tyilo tyilo force-pushed the big-file-fail-fast branch from c996de3 to 5caf516 Compare November 22, 2024 23:25
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 22, 2024
@compiler-errors
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Nov 24, 2024

📌 Commit 5caf516 has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 24, 2024
@bors
Copy link
Contributor

bors commented Nov 24, 2024

⌛ Testing commit 5caf516 with merge ab3cf26...

@bors
Copy link
Contributor

bors commented Nov 24, 2024

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing ab3cf26 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 24, 2024
@bors bors merged commit ab3cf26 into rust-lang:master Nov 24, 2024
7 checks passed
@rustbot rustbot added this to the 1.85.0 milestone Nov 24, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ab3cf26): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.3% [1.5%, 3.7%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.3% [1.5%, 3.7%] 4

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 796.131s -> 798.172s (0.26%)
Artifact size: 336.32 MiB -> 336.30 MiB (-0.01%)

@@ -1843,6 +1843,8 @@ impl StableSourceFileId {
}

impl SourceFile {
const MAX_FILE_SIZE: u32 = u32::MAX - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the -1 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #132862

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is fail fast for files LARGER than 4 GiB. 4GiB is 2^32B is (u32::MAX + 1) bytes, so U32::MAX is 4Gi - 1, and U32::MAX - 1 is 4Gi - 2. What is the purpose of a constant for 4GiB - 2B when preventing use of files larger than 4GiB? Is that not off by 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this PR was for enforcing that files of size 4 GiB and larger would not work, meaning the maximum file size would be 4 GiB - 1 B = u32::MAX bytes.

However, I then discovered #132862, so files of size 4 GiB - 1 B and larger would not work, so the maximum file size is actually 4 GiB - 2 B = u32::MAX - 1. So no the current value is correct.

I've just tested this again to make sure. Using the current stable and nightly version of rustc, I get the following results:

File size (bytes) stable result nightly result
4,294,967,294 success success
4,294,967,295 ICE (#132862) instant failure
4,294,967,296 failure after reading the file to memory instant failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants