-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Spark-4.0 widening type support #604
feat: Spark-4.0 widening type support #604
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This is ready for review @andygrove @comphead @huaxingao @viirya |
if dst_scale > src_scale { | ||
let mul = (10 as $dst_type).pow(dst_scale - src_scale); | ||
for i in 0..num { | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add SAFETY comments for the unsafe blocks that explain how/why they are actually safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on this area but the changes seem reasonable to me. I think it would be good practice for us to document any unsafe code that we are adding.
{ | ||
let mut offset = src.offset; | ||
for _ in 0..num { | ||
let v = &src_data[offset..offset + byte_width] as *const [u8] as *const u8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we have to use as *const [u8] as *const u8 as *const i32
rather than .as_ptr() as *const i32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
btw, this is really a |
native/core/benches/parquet_read.rs
Outdated
@@ -54,7 +54,7 @@ fn bench(c: &mut Criterion) { | |||
); | |||
b.iter(|| { | |||
let cd = ColumnDescriptor::new(t.clone(), 0, 0, ColumnPath::from(Vec::new())); | |||
let promition_info = TypePromotionInfo::new(PhysicalType::INT32, -1, -1); | |||
let promition_info = TypePromotionInfo::new(PhysicalType::INT32, -1, -1, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let promition_info = TypePromotionInfo::new(PhysicalType::INT32, -1, -1, 32); | |
let promotion_info = TypePromotionInfo::new(PhysicalType::INT32, -1, -1, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -48,6 +52,7 @@ make_type!(FLBADecimal32Type); | |||
make_type!(FLBADecimal64Type); | |||
make_type!(FLBAType); | |||
make_type!(Int32DateType); | |||
make_type!(Int32TimestampMicrosType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have Int32TimestampMillisType
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a usage for mills for now I think
make_int_variant_dict_impl!(Int32To64Type, i32, i64); | ||
make_int_variant_dict_impl!(Int32ToDecimal64Type, i32, i64); | ||
make_int_variant_dict_impl!(Int32ToDoubleType, i32, f64); | ||
make_int_variant_dict_impl!(Int32TimestampMicrosType, i32, i64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int32 Millis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a usage for mills for now I think
|
||
// TODO: optimize this further as checking value one by one is not very efficient | ||
unsafe { | ||
if unlikely(v.read_unaligned() < JULIAN_GREGORIAN_SWITCH_OFF_DAY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use unlikely as its rust nightly feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have our own unlikely
https://github.com/apache/datafusion-comet/blob/main/native/core/src/lib.rs#L126
if unlikely(v.read_unaligned() < JULIAN_GREGORIAN_SWITCH_OFF_DAY) { | ||
panic!( | ||
"Encountered timestamp value {}, which is before 1582-10-15 (counting \ | ||
backwards from Unix eopch date 1970-01-01), and could be ambigous \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backwards from Unix eopch date 1970-01-01), and could be ambigous \ | |
backwards from Unix epoch date 1970-01-01), and could be ambiguous \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
generate_cast_to_signed!(copy_f32_to_f64, f32, f64); | ||
|
||
fn copy_i64_to_i64(src: &[u8], dst: &mut [u8], num: usize) { | ||
debug_assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering can it be a single check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is debug_assert!
so should be fine to check every time
"Destination slice is too small" | ||
); | ||
|
||
bit::memcpy_value(src, 8 * num, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 here is std::mem::size_of:: ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kazuyukitanimura I'm wondering should be have tests to read those types?
updated |
Thanks @comphead |
@parthchandra @andygrove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Except one comment (mostly for my understanding)
Merged, thank you @parthchandra @andygrove @comphead |
## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR adds type widening feature support introduced in Spark-4.0 ## How are these changes tested? Enabled ParquetTypeWideningSuite
Which issue does this PR close?
Part of #372 and #551
Rationale for this change
To be ready for Spark 4.0
What changes are included in this PR?
This PR adds type widening feature support introduced in Spark-4.0
How are these changes tested?
Enabled ParquetTypeWideningSuite