-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use undef for (some) partially-uninit constants #94130
Conversation
There needs to be some limit to avoid perf regressions on large arrays with undef in each element (see comment in the code).
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit b7e5597 with merge 796a3796f114911f2d4b98ea97c8ff28a6f37634... |
I know next to nothing about how LLVM represents consts and when or how that might cause slowdowns, and this doesn't even touch the CTFE engine, so I don't think I am a suitable reviewer for this. |
r? @nagisa |
cc @nikic |
I wonder if the right heuristic here isn't related to size, but rather whether the constant would otherwise be zeroinitializer? If it isn't, then the value needs to be made explicit anyway, and it shouldn't matter whether the explicit value is undef or some other value. |
☀️ Try build successful - checks-actions |
Queued 796a3796f114911f2d4b98ea97c8ff28a6f37634 with parent b8c56fa, future comparison URL. |
I think that makes some sense* -- but I do want to emit undef for some values that would otherwise be zeroinitializer. e.g. a type like struct Foo {
init: bool,
data: MaybeUninit<SomeLargeStruct>,
}
const FOO: Foo = Foo { init: false, data: MaybeUninit::uninit() }; could be zeroinitializer, but would benefit from being partially undef since only one byte is initialized. Combining the two heuristics is an option of course (using undef for consts that are either nonzero or < size limit), although at that point I'm not sure it would be worth it. Another option: limit the number of fields in the anonymous struct that we generate, i.e., the number of contiguous chunks of all-init or all-uninit bytes. I think this would be a decently close match to the actual cost of generating the const expression. It would also let us handle (potentially) profitable cases like my * I do think it's still not fully accurate though--it ought to be more expensive to generate |
Switched to limiting the number of chunks. Let me know if you'd prefer something else. |
This comment has been minimized.
This comment has been minimized.
8ae018f
to
d5769e9
Compare
Finished benchmarking commit (796a3796f114911f2d4b98ea97c8ff28a6f37634): comparison url. Summary: This benchmark run did not return any relevant results. 5 results were found to be statistically significant but too small to be relevant. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
Yeah, using the number of chunks makes a lot of sense to me. |
adbf133
to
c2e84fa
Compare
@bors r+ |
@bors r=nikic |
📌 Commit 067f628 has been approved by |
This comment has been minimized.
This comment has been minimized.
@bors r- |
@bors r+ |
📌 Commit 5bf8303 has been approved by |
⌛ Testing commit 5bf8303 with merge 3450b91e88c8e5343cffeead37603a63a8f904a9... |
💔 Test failed - checks-actions |
@bors retry spurious network error |
☀️ Test successful - checks-actions |
Finished benchmarking commit (ece55d4): comparison url. Summary: This benchmark run shows 3 relevant improvements 🎉 but 6 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression |
Perf changes:
LLVM does less work. Judging from the involved functions, this is likely because the generated bitcode is much smaller, since we emit just
LLVM does more work, likely because we generate constants with more fields (which is expected). Functions like Edit: opened #94372
all the
Judging from callgrind (omitted from this comment), this is a Edit: opened #94373 |
There needs to be some limit to avoid perf regressions on large arrays
with undef in each element (see comment in the code).
Fixes: #84565
Original PR: #83698
Depends on LLVM 14: #93577