-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45175: [Python] Honor the strings_to_categorical keyword in to_pandas for string view type #45176
GH-45175: [Python] Honor the strings_to_categorical keyword in to_pandas for string view type #45176
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorisvandenbossche
I've learnt today that Table.__getitem__
returns a ChunkedArray.
I am going to merge this. Where you expecting this to go on 19.0.0? cc @amoeba
…das for string view type (#45176) ### Rationale for this change Currently this keyword works for string or large string: ```python >>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string())}) >>> table.to_pandas(strings_to_categorical=True).dtypes col category dtype: object >>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.large_string())}) >>> table.to_pandas(strings_to_categorical=True).dtypes col category dtype: object ``` but not for string view: ```python >>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string_view())}) >>> table.to_pandas(strings_to_categorical=True).dtypes col object dtype: object ``` For consistency we should make that keyword check for string view columns as well, I think From https://github.com/apache/arrow/pull/44195/files#r1901831460 ### Are these changes tested? Yes ### Are there any user-facing changes? Yes, when using the `strings_to_categorical=True` keyword and having a string_view type, this column will now be converted to a pandas Categorical * GitHub Issue: #45175 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…das for string view type (#45176) ### Rationale for this change Currently this keyword works for string or large string: ```python >>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string())}) >>> table.to_pandas(strings_to_categorical=True).dtypes col category dtype: object >>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.large_string())}) >>> table.to_pandas(strings_to_categorical=True).dtypes col category dtype: object ``` but not for string view: ```python >>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string_view())}) >>> table.to_pandas(strings_to_categorical=True).dtypes col object dtype: object ``` For consistency we should make that keyword check for string view columns as well, I think From https://github.com/apache/arrow/pull/44195/files#r1901831460 ### Are these changes tested? Yes ### Are there any user-facing changes? Yes, when using the `strings_to_categorical=True` keyword and having a string_view type, this column will now be converted to a pandas Categorical * GitHub Issue: #45175 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2c5ae51. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Currently this keyword works for string or large string:
but not for string view:
For consistency we should make that keyword check for string view columns as well, I think
From https://github.com/apache/arrow/pull/44195/files#r1901831460
Are these changes tested?
Yes
Are there any user-facing changes?
Yes, when using the
strings_to_categorical=True
keyword and having a string_view type, this column will now be converted to a pandas Categoricalto_pandas
for string view type #45175