-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) #59376
String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) #59376
Conversation
…(remove pyarrow_numpy)
@@ -1305,7 +1304,6 @@ def string_storage(request): | |||
|
|||
* 'python' | |||
* 'pyarrow' | |||
* 'pyarrow_numpy' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the string_storage
fixture will no longer cover the "pyarrow + NaN" variant of the string dtype directly with this change, but:
- Most tests using this fixture are inside the
pandas/tests/io
submodule, which had an override of this fixture to not include "pyarrow_numpy" anyway (and I removed this override) - I checked/updated the other tests and none of them really need this.
if value == "pyarrow_numpy": | ||
# TODO: we can remove extra message after 3.0 | ||
msg += ( | ||
". 'pyarrow_numpy' was specified, but this option should be " | ||
"enabled using pandas.options.future.infer_string instead" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this custom message to the error in case someone actually did end up using pd.options.mode.string_storage = "pyarrow_numpy"
to point them to the right option to enable.
def test_astype_to_string_dtype_not_modifying_input(any_string_dtype, val): | ||
# GH#51073 - variant of the above test with explicit dtype instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this variant because the test above with the string_storage option and astype("string")
will now only test the NA-variants of the string dtype.
with cf.config_prefix("mode"): | ||
cf.register_option( | ||
"string_storage", | ||
"python", | ||
string_storage_doc, | ||
validator=is_one_of_factory(["python", "pyarrow", "pyarrow_numpy"]), | ||
# validator=is_one_of_factory(["python", "pyarrow"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# validator=is_one_of_factory(["python", "pyarrow"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to leave this line in so it's easier to go back to that, given that this extra message is only meant to stay for 2.3, and for 3.0 we should already remove it again (because the infer_string option will be enabled by default in 3.0, so then there is no point to let users set that manually)
Thanks @jorisvandenbossche |
…(remove pyarrow_numpy) (#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (pandas-dev#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
…(remove pyarrow_numpy) (#59376) * String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) * add type annotation
We allowed to set
"pyarrow_numpy"
as a value to themode.string_storage
option.I assume this was mostly done for testing purposes, because we explicitly documented that this option would be ignored when enabling the
future.infer_string
option (i.e. when "pyarrow_numpy" is being used by default).In context of the new string dtype (PDEP-14), I think we should remove that option again (since we will generally deprecate/remove "pyarrow_numpy"). Given the above (not documented), I opted for directly removing it from the option instead of first deprecating it (for the StringDtype constructor / string alias, we will first deprecate it). But I did add a clarification to the option error message in case someone actually did use it.
xref #54792