-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support casting from binary to dictionary of binary #3482
Conversation
fn pack_binary_to_dictionary<K>( | ||
array: &ArrayRef, | ||
cast_options: &CastOptions, | ||
) -> Result<ArrayRef, ArrowError> | ||
where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can have a generic one for both string and binary, I may try it in a follow up work.
82e3b8d
to
8f21b9a
Compare
@@ -268,7 +268,7 @@ where | |||
let keys = self.keys_builder.finish(); | |||
|
|||
let data_type = | |||
DataType::Dictionary(Box::new(K::DATA_TYPE), Box::new(DataType::Utf8)); | |||
DataType::Dictionary(Box::new(K::DATA_TYPE), Box::new(T::DATA_TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eep
f71a2d1
to
6a4ca6e
Compare
@@ -3285,6 +3285,8 @@ fn cast_to_dictionary<K: ArrowDictionaryKeyType>( | |||
), | |||
Utf8 => pack_string_to_dictionary::<K>(array, cast_options), | |||
LargeUtf8 => pack_string_to_dictionary::<K>(array, cast_options), | |||
Binary => pack_binary_to_dictionary::<K>(array, cast_options), | |||
LargeBinary => pack_binary_to_dictionary::<K>(array, cast_options), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method assumes BinaryArray, so I'm not sure how this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like pack_string_to_dictionary
, they both perform casting first to get StringArray
/BinaryArray
which is taken to build dictionary array. I think we can simplify this by removing first casting. I will do it in a follow up pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is also problematic as it bat truncate offsets, would be good to fix
Dictionary(Box::new(DataType::UInt32), Box::new(DataType::Utf8)), | ||
Dictionary(Box::new(DataType::UInt32), Box::new(DataType::Binary)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for this choice of dictionary keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just picked it following other existing types. Maybe as a improvement, I can refactor this in a separate pr to iterate all possible dictionary key types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay
Benchmark runs are scheduled for baseline = 95cf030 and contender = 07fd434. 07fd434 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #1179.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?