-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix arrow groupby na #60777
base: main
Are you sure you want to change the base?
Fix arrow groupby na #60777
Conversation
0e61611
to
66330ee
Compare
273d85f
to
226ae6c
Compare
f973800
to
72beba9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, merging main should make the CI green.
4ffc234
to
4d82464
Compare
The CI is now green. @rhshadrach can you please merge this PR. |
encoded = data | ||
if null_encoding == "encode": | ||
# dictionary encode does nothing if an already encoded array is given | ||
data = data.cast(data.type.value_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use dictionary_decode()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionary_decode()
works on the DictionaryArray
while the input to factorize()
function is a ChunkedArray
. So the chunked array first needs to be combined into a dictionary array and then the dictionary_decode()
can be used on it which i think is more inefficient as compared to simply casting the array to its underlying data type.
4d82464
to
273e998
Compare
4952817
to
59a4668
Compare
…d backwards compatibility maintained
59a4668
to
6ca56d0
Compare
doc/source/whatsnew/v3.0.0.rst
file for the bug fix.