-
Notifications
You must be signed in to change notification settings - Fork 221
Conversation
std::Vec
.std::Vec
.
Codecov Report
@@ Coverage Diff @@
## main #385 +/- ##
==========================================
- Coverage 80.64% 80.03% -0.62%
==========================================
Files 372 371 -1
Lines 22691 22709 +18
==========================================
- Hits 18300 18175 -125
- Misses 4391 4534 +143
Continue to review full report at Codecov.
|
Added bench for addition of two arrays,
other arithmetics benches show no differences. |
How should I read this? Is positive change longer runtime? |
Great cleanup! My thoughts:
|
IMO having the same alignment as the std vec is really worth something. This allows zero copy moving in an out of arrow, improvin interop with the rust landcape. I also did some benchmarks on polars and the db-benchmark groupby questions (this is with mimalloc):
So the runtime difference seem to be negligible for that benchmark. The compilation times are getting better. 🙂 |
My suggestion above still would be to use a For some use cases (like DataFusion) there might be a good argument to remove the 128 byte alignment, e.g. to remove some memory overhead for small batches. |
You mean, keep the same allocator generic and default to the standard one? As I understand it, deallocating with a different alignment than the allocation is UB. This would indeed not happen if we are generic over the allocator, same as Vec. |
I think @Dandandan is suggesting switching to Vec and, in nightly (or in the future), offer an implementation of the trait std::Allocator in case people would like to use the 64/128 byte allocation. In other words, there is no loss of generality here for nightly if we offer the 128 byte implementation of |
Check! 👍 |
Yes, the proposal in that case would be to use e.g. : struct ArrowAllocator;
impl Allocator for ArrowAllocator {
fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
// round up to desired alignment
let custom_layout = Layout::from_size_align(layout.size, round_up_to_128(layout.align))?;
// use global allocator to actually allocate the data, with changed aligment
let alloc = alloc::alloc(custom_layout);
// return as slice
Ok(from_raw_parts(..))
}
// other methods
...
}
type ArrowVec<T> = Vec<T, ArrowAllocator>; You can now use the same API for this |
I think it'd be better to have In the setup you propose a |
Yes, good suggestion. |
I just found out about this one from @ritchie46. This is a very nice clean up. |
It works because struct Buffer<T> {
data: Arc<Bytes<T>>
offset: usize
}
/// Mode of deallocating memory regions
pub enum Deallocation {
/// Native deallocation
Native(usize),
// Foreign interface, via a callback
Foreign(Arc<ffi::ArrowArray>),
}
pub struct Bytes<T: NativeType> {
ptr: NonNull<T>,
len: usize,
deallocation: Deallocation,
}
impl<T: NativeType> Drop for Bytes<T> {
#[inline]
fn drop(&mut self) {
match &self.deallocation {
Deallocation::Native(capacity) => {
unsafe { Vec::from_raw_parts(self.ptr, self.len, *capacity) };
}
// foreign interface knows how to deallocate itself.
Deallocation::Foreign(_) => (),
}
}
} i.e. replace What we lose are allocations along cache lines (e.g. 64 bytes). There are indications that, when adding two arrays, there is a penalty in not having them aligned along cache lines. |
I see... you are creating a vector here Deallocation::Native(capacity) => {
unsafe { Vec::from_raw_parts(self.ptr, self.len, *capacity) };
} and then you let rust drop it. I'll dig deeper later once it has been merged into master. Nice work indeed |
06473b1
to
7fb641f
Compare
I have taken an alternative, less drastic approach in the latest push: made the aligned allocation feature gated under Essentially:
|
9927ebe
to
47e143c
Compare
With MIRI on CI for both branches, we are ready to merge. If anyone has any objection, go ahead :P |
This PR moves the custom allocator to a feature gate
cache_aligned
, replacing it bystd::Vec
in thedefault
,full
, etc.This allows users to create
Buffer
andMutableBuffer
directly from aVec
at zero cost (MutableBuffer
becomes a thin wrapper ofVec
). This opens a whole range of possibilities related to crates that assume that the backing container is aVec
.Note that this is fully compatible with arrow spec; we just do not follow the recommendation that the allocations follow 64 bytes (which neither this nor arrow-rs was following, since we have been using 128 byte alignments in
x86_64
). I was unable to observe any performance difference between 128-byte, 64-byte and std's (i.e. aligned withT
) allocations.This is backward incompatible, see #449 for details. Closes #449.