-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Remove unsafe use of from_raw_parts in Parquet decoder #549
Conversation
@@ -455,23 +455,21 @@ macro_rules! make_int_variant_impl { | |||
|
|||
while num - i >= 32 { | |||
unsafe { | |||
let in_slice = std::slice::from_raw_parts(in_ptr, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe because there is no guarantee that the buffer is properly aligned for u32 type (in_ptr is defined earier as let mut in_ptr = &src_data[src_offset..] as *const [u8] as *const u8 as *const u32;
).
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #549 +/- ##
=============================================
- Coverage 54.78% 34.26% -20.53%
- Complexity 795 803 +8
=============================================
Files 103 106 +3
Lines 9690 38557 +28867
Branches 1844 8572 +6728
=============================================
+ Hits 5309 13212 +7903
- Misses 3427 22589 +19162
- Partials 954 2756 +1802 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we investigate why we have unaligned memory in the first place?
let mut i = 0; | ||
let mut in_ptr = &src_data[src_offset..] as *const [u8] as *const u8 as *const u32; | ||
|
||
while num - i >= 32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this loop is effectively not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I combined the two loops into one. I explained more in #549 (comment)
The input is a slice of
The original code seems to be copying data over using two loops. The first loop was copying in slices of 32 items and then the second loop copied any remaining items. I can only assume that this was some sort of optimization, but it relied on undefined behavior, which is now caught by Rust 1.78. I simplified the code to use a single loop and removed the unsafe optimization. The code is now almost identical to the decode method for other types such as this one, which I based it on: impl PlainDecoding for Int32To64Type {
fn decode(src: &mut PlainDecoderInner, dst: &mut ParquetMutableVector, num: usize) {
let src_ptr = src.data.as_ptr() as *const i32;
let dst_ptr = dst.value_buffer.as_mut_ptr() as *mut i64;
unsafe {
for i in 0..num {
dst_ptr
.add(dst.num_values + i)
.write_unaligned(src_ptr.add(src.offset + i).read_unaligned() as i64);
}
}
src.offset += 4 * num;
} I don't know why we were trying to do the optimization for some cases and not others. Perhaps @sunchao can remember. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct. I also forgot exactly why the code is written in the original way: maybe it is for some sort of loop unrolling. You could check the internal git blame
results to see if there is some hint.
Thanks for the review @sunchao! |
What I meant was that we expect the source buffer to be aligned on a word boundary. Subsequently, all offsets applied to the starting address are at least a multiple of size of word so every input should be aligned. |
…#549) * refactor make_int_variant_impl to remove unsafe use of from_raw_parts * fix * refactor * smaller change * remove commented out code * remove in_ptr * simplify
Which issue does this PR close?
Part of #507
Rationale for this change
The documentation for
from_raw_parts
states that the the memory must be "properly aligned" and Rust 1.78 added debug assertions that fail if the memory is not properly aligned. We are seeing those assertions fail when we run with Rust 1.78.https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html
What changes are included in this PR?
Refactor
make_int_variant_impl
to follow the same pattern as other decode methods and useread_unaligned
andwrite_unaligned
instead of incorrect use offrom_raw_parts
.How are these changes tested?
Existing tests.