-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard-linking incremental files can be expensive. #64291
Comments
Producing fewer files for incremental compilation would reduce reuse/hit ratio. Not sure whether number of links can be reduced, but OS taking 0.2ms on average to increment a link count and add inode to a directory list seems so excessive that I suspect there might be something else going as well (e.g. perhaps we’re saturating the I/O interface with other ops?) |
(@wesleywiser notes that this might also be a big issue on windows, since we probably aren't using junction points there, and thus we may be copying files back and forth there, which would be ... bad with respect to time and space usage...) |
One idea also posed by @wesleywiser : we could consider switching from leaning on FS features like linking, and instead use e.g. a text file storing the linking info. The latter might be much faster. |
Discussed in today's meeting. Next steps here may be to evaluate which files are involved in the sym-linking (is it just serialized dep-graph info, or is it also .o files that will be passed to linker invocations). The reason for the question is to determine how cross-cutting a change here would be (to introduce an auxiliary file as discussed in my previous comment). |
@pnkfelix I'm not too familiar with how incremental works, but the files are of the form: I just checked, and it looks like |
…rrors Avoid no-op unlink+link dances in incr comp Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that. The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory. This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links. `cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`: ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 52.56 0.024950 25 978 485 unlink 34.38 0.016318 22 727 linkat 13.06 0.006200 24 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.047467 24 1954 485 total ``` ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.83 0.014521 57 252 unlink 38.41 0.013021 26 486 linkat 18.77 0.006362 25 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.033904 34 987 total ``` This reduces the number of hard-links that are causing perf troubles, noted in rust-lang#64291 and rust-lang#137560
Avoid no-op unlink+link dances in incr comp Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that. The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory. This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links. `cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`: ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 52.56 0.024950 25 978 485 unlink 34.38 0.016318 22 727 linkat 13.06 0.006200 24 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.047467 24 1954 485 total ``` ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.83 0.014521 57 252 unlink 38.41 0.013021 26 486 linkat 18.77 0.006362 25 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.033904 34 987 total ``` This reduces the number of hard-links that are causing perf troubles, noted in rust-lang/rust#64291 and rust-lang/rust#137560
Avoid no-op unlink+link dances in incr comp Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that. The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory. This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links. `cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`: ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 52.56 0.024950 25 978 485 unlink 34.38 0.016318 22 727 linkat 13.06 0.006200 24 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.047467 24 1954 485 total ``` ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.83 0.014521 57 252 unlink 38.41 0.013021 26 486 linkat 18.77 0.006362 25 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.033904 34 987 total ``` This reduces the number of hard-links that are causing perf troubles, noted in rust-lang/rust#64291 and rust-lang/rust#137560
Avoid no-op unlink+link dances in incr comp Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that. The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory. This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links. `cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`: ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 52.56 0.024950 25 978 485 unlink 34.38 0.016318 22 727 linkat 13.06 0.006200 24 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.047467 24 1954 485 total ``` ``` % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.83 0.014521 57 252 unlink 38.41 0.013021 26 486 linkat 18.77 0.006362 25 249 unlinkat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.033904 34 987 total ``` This reduces the number of hard-links that are causing perf troubles, noted in rust-lang/rust#64291 and rust-lang/rust#137560
Doing some profiling to determine why incremental Cargo builds are slow, I noticed that hard-linking the incremental files takes a nontrivial amount of time. This issue is to ask those in-the-know if it may be possible to improve it.
On my system (macos, apfs, an extremely fast ssd), it takes between 500 to 750ms of a 7s incremental build (7% to 10%) just to hard-link the incremental files. The incremental build of cargo's lib generates about 1000 files, and it appears the incremental system links them 3 times (once from
incremental
toworking
, thenworking
todeps
, and thendeps
back toincremental
), for a total of about 3,000 hard links.I see two things here:
The text was updated successfully, but these errors were encountered: