-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redirect __rust_dealloc
to sdallocx
#122329
Conversation
rustbot has assigned @petrochenkov. Use r? to explicitly pick a reviewer |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Redirect `__rust_dealloc` to `sdallocx` This could use a perf run.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ecb3315): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 645.581s -> 646.145s (0.09%) |
This was previously tried here, although seemingly in a different way. CC @nnethercote |
The PR needs a proper description - what exactly it changes and why it's different from jemalloc overrides below, why it helps, what |
|
If #122362 works out we could probably just do this change with a |
#[cfg(feature = "jemalloc-sys")] | ||
#[no_mangle] | ||
pub unsafe extern "C" fn __rust_dealloc(ptr: *mut u8, size: usize, _align: usize) { | ||
unsafe { jemalloc_sys::sdallocx(ptr.cast(), size, 0) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zoxc I think you closed this PR to try using a #[global_allocator]
here with the statically linked std, but just as a heads up:
- I had quickly tried using jemalloc as a global allocator in the rustc driver and it segfaulted while building std
- otherwise, don't we also need to take alignment into account in the
__rust_dealloc
code above in the general case, so that it matches whatSystem.alloc
does for big alignments and allocations w/ a smaller size than the alignment? That being said, in the context of rustc it shouldn't matter, these values should be coming from types that wouldn't need non-zero jemalloc flags. I did a perf run with both 0, and flags computed by checking size/alignment, and it seemed neither made any improvement over the master branch. Weird, after seeing wins in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I had quickly tried using jemalloc as a global allocator in the rustc driver and it segfaulted while building std
Do you have a branch for that?
- otherwise, don't we also need to take alignment into account in the
__rust_dealloc
code above
My understanding is that the alignment argument is just a performance hint for sdallocx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it's just a performance hint, because it modifies the actual size (jemalloc/jemalloc-cmake@4cfe551#diff-a4cb09e38cfec8141b07c291f731a8e01a17412568a852884fd921e8e521766bR1850 - this is some old code, the new one is more complicated, but does the same thing). Sadly, it also seems like sdallocx takes a slow path when flags != 0
, although that might not happen that often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a branch for that?
yes, some wip work here lqd@6819e3c
rustc-main symbols override are still there as their presence/absence didn't seem to impact the segfault, but maybe it does and llvm would need to be setup differently.
This could use a perf run.