-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize Vec::from_elem to use calloc #40409
Conversation
r? @BurntSushi (rust_highfive has picked a reviewer for you, use r? to override) |
What's the performance for short (n <= 8)? |
For the following benchmarks, using the default allocator (alloc_jemalloc) on Linux: #[bench]
fn bench_big(b: &mut Bencher) {
b.iter(|| vec![0u8; 1024 * 1024]);
}
#[bench]
fn bench_medium(b: &mut Bencher) {
b.iter(|| vec![0u8; 1024]);
}
#[bench]
fn bench_small(b: &mut Bencher) {
b.iter(|| vec![0u8; 8]);
} Before this PR:
After this PR:
|
Is there a reason why this can't be specialised for all integer types if the value is zero? |
It can, and I'm working on a patch to do that. It could also be specialized for floats, and for some less-common types like |
I think that specialising for integer zeroes is definitely reasonable (because |
@whitequark made a good point on IRC: If we had a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(damn github review didn’t get posted)
src/liballoc_jemalloc/lib.rs
Outdated
} else { | ||
let flags = align_to_flags(align); | ||
unsafe { | ||
let ptr = mallocx(size as size_t, flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use MALLOCX_ZERO
flag instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@solson how would that work here though? This only handles the case of an all-zero initial bit pattern. |
} else { | ||
let ptr = aligned_malloc(size, align); | ||
if !ptr.is_null() { | ||
ptr::write_bytes(ptr, 0, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fallback seems a bit unfortunate, but I guess there's no other option?
@sfackler My understanding is that for EDIT: See this sentence from this issue description:
|
Also one thing to consider is that |
@solson the missing part of the explanation is how to identify that |
Added a patch to specialize |
@bluss One basic way would be to iterate over the bytes of |
@@ -92,6 +98,16 @@ mod imp { | |||
} | |||
|
|||
#[no_mangle] | |||
pub extern "C" fn __rust_allocate_zeroed(size: usize, align: usize) -> *mut u8 { | |||
if align <= MIN_ALIGN { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super familiar with jemalloc - is calloc
faster than unconditionally calling mallocx
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mallocx is parsing the options out from flags’ bitmask, whereas calloc sets them directly. Difference likely negligible though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. Seems fine either way.
r=me other than one dumb question about jemalloc. |
@bors r+ |
📌 Commit 4961f6c has been approved by |
Specialize Vec::from_elem to use calloc Fixes rust-lang#38723. This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired. I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux. (I have not tested or even built the Windows code.)
Specialize Vec::from_elem to use calloc Fixes rust-lang#38723. This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired. I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux. (I have not tested or even built the Windows code.)
⌛ Testing commit aad2062 with merge 966a32a... |
💔 Test failed - status-travis |
Timed out, presumably spurious. Retrying. @bors retry |
Specialize Vec::from_elem to use calloc Fixes rust-lang#38723. This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired. I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux. (I have not tested or even built the Windows code.)
⌛ Testing commit aad2062 with merge 31bbdc1... |
💔 Test failed - status-appveyor |
@bors retry
|
Specialize Vec::from_elem to use calloc Fixes #38723. This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired. I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux. (I have not tested or even built the Windows code.)
☀️ Test successful - status-appveyor, status-travis |
This is fallout from rust-lang/rust#40409 which requires that all allocators provide a `__rust_alloc_zeroed` function. Fixes japaric#136.
This is fallout from rust-lang/rust#40409 which requires that all allocators provide a `__rust_alloc_zeroed` function. Fixes japaric#136.
rust-lang/rust#40409 requires a __rust_alloc_zeroed function.
rust-lang/rust#40409 requires a __rust_alloc_zeroed function.
rust-lang/rust#40409 requires a __rust_alloc_zeroed function.
Fixes #38723.
This specializes the implementation foru8
only, but it could be extended to other zeroable types if desired.I haven't tested this extensively, but I did verify that it gives the expected performance boost for large
vec![0; n]
allocations with both alloc_system and jemalloc, on Linux. (I have not tested or even built the Windows code.)