-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add dictionary binary to shuffle writer #111
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -292,6 +292,12 @@ pub fn create_hashes<'a>( | |
DataType::LargeUtf8 => { | ||
hash_array!(LargeStringArray, col, hashes_buffer); | ||
} | ||
DataType::Binary => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to confirm, this is for equal/hash contract for Spark side's sort operations? I noticed some error when testing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I remember |
||
hash_array!(BinaryArray, col, hashes_buffer); | ||
} | ||
DataType::LargeBinary => { | ||
hash_array!(LargeBinaryArray, col, hashes_buffer); | ||
} | ||
DataType::FixedSizeBinary(_) => { | ||
hash_array!(FixedSizeBinaryArray, col, hashes_buffer); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug fixes? It looks like we may lack some unit test coverage in the rust side.
Maybe we should add more unit tests in the rust side in the follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it is a typo. More test coverage is good. As there are too many combinations, we probably miss some in current test coverage.