-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align rustc_metadata Table to word boundary #76620
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
I'm not familiar with metadata encoding, so r? @Mark-Simulacrum (or someone who can verify the following assumption)
I believe recent Intel processors have a hardware counter for unaligned loads, although you'll need to pass an additional flag to |
I guess we should at least gather perf stats, though that's on a Ryzen 3600. (Instructions should basically not change, but wall-times will still be interesting). I think probably @eddyb or maybe @petrochenkov know the metadata system best right now. @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit e647c6a with merge bf8d56e992aa4e07bacb0236d07c67f1a921bbdb... |
☀️ Try build successful - checks-actions, checks-azure |
Queued bf8d56e992aa4e07bacb0236d07c67f1a921bbdb with parent 7adeb2c, future comparison URL. |
Finished benchmarking try commit (bf8d56e992aa4e07bacb0236d07c67f1a921bbdb): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Looking at wall times, perf looks mixed, and cpu clock just shows regressions, so I feel like this is probably not really an improvement (or at least not always one), and given that it's pretty finicky code wise I'm inclined to close. |
I agree that the change is hard to justify without meaningful performance improvement, so closing this. |
This can save a few cycles when converting the bytes to u32. That said, probably very marginal benefits. The motivation is more like, borrowing ideas from FlatBuffers/Cap'n Proto.
I wonder if inserting arbitrary paddings (without communicating them to the decoder) between lazy items are OK? From my understanding it's fine, since lazy items are only referenced by relative/absolute position and does not have to be in a packed layout.
Also note for perf: this will not reduce the instruction count, which is the default metric, since the decoder part uses the same (machine) code.