-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: get_prune_stats
returns homogenous ArrayRef
#1413
Conversation
48821e4
to
1cd87b5
Compare
Test table `issue_1374` was created by hand to have 2 data files where only one file has the `min_values` for the statistics in the `checkpoint.parquet` file set to null in order to trigger the bug. There is no other significance to the table other than to demonstrate issue delta-io#1374. ``` internal error: entered unreachable code thread 'test_issue_1374' panicked at 'internal error: entered unreachable code', /Users/cole/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-common-24.0.0/src/scalar.rs:2472:26 ```
1cd87b5
to
e37f6a3
Compare
Looks like this change causes this test to begin failing and I am unsure of what the correct behavior should be here.
|
Thoughts on this diff? My concern is there is no explicit restriction on the types, not sure if that matters though diff --git a/rust/src/delta_datafusion.rs b/rust/src/delta_datafusion.rs
index 43aed52..6ce3f1e 100644
--- a/rust/src/delta_datafusion.rs
+++ b/rust/src/delta_datafusion.rs
@@ -590,11 +590,14 @@ pub(crate) fn get_null_of_arrow_type(t: &ArrowDataType) -> DeltaResult<ScalarVal
TimeUnit::Nanosecond => ScalarValue::TimestampNanosecond(None, tz),
})
}
+ ArrowDataType::Dictionary(k, v) => Ok(ScalarValue::Dictionary(
+ k.clone(),
+ Box::new(get_null_of_arrow_type(v).unwrap()),
+ )),
//Unsupported types...
ArrowDataType::Float16
| ArrowDataType::Decimal256(_, _)
| ArrowDataType::Union(_, _)
- | ArrowDataType::Dictionary(_, _)
| ArrowDataType::LargeList(_)
| ArrowDataType::Struct(_)
| ArrowDataType::List(_) |
Switch the `get_prune_stats` functions to use `None` to represent `null` instead of `ScalarValue::Null` as `ArrayRef` must be of all the same type. https://github.com/apache/arrow-datafusion/blob/dd5e1dbbfd20539b40ae65acb8883f7e392cba92/datafusion/core/src/physical_optimizer/pruning.rs#L54-L72
e37f6a3
to
1e4c57f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective returning the appropriate nullsy type for the column (which is how I understand this) makes sense. It does feel like some Arrow type system wackiness that there's a Null type and also a nullsy boolean, etc 😛
# Description Switch the `get_prune_stats` functions to use `None` to represent `null` instead of `ScalarValue::Null` as `ArrayRef` must be of all the same type. # Related Issue(s) - closes delta-io#1374 # Documentation https://github.com/apache/arrow-datafusion/blob/dd5e1dbbfd20539b40ae65acb8883f7e392cba92/datafusion/core/src/physical_optimizer/pruning.rs#L54-L72 --------- Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
Description
Switch the
get_prune_stats
functions to useNone
to representnull
instead of
ScalarValue::Null
asArrayRef
must be of all the sametype.
Related Issue(s)
Documentation
https://github.com/apache/arrow-datafusion/blob/dd5e1dbbfd20539b40ae65acb8883f7e392cba92/datafusion/core/src/physical_optimizer/pruning.rs#L54-L72