Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: Accept nan-likes in StringArray constructor #40839

Closed
lithomas1 opened this issue Apr 8, 2021 · 3 comments
Closed

API/ENH: Accept nan-likes in StringArray constructor #40839

lithomas1 opened this issue Apr 8, 2021 · 3 comments
Labels
Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays Strings String extension data type and string data
Milestone

Comments

@lithomas1
Copy link
Member

lithomas1 commented Apr 8, 2021

Is your feature request related to a problem?

Currently, StringArray can only be instantiated directly with a ndarray with strings or NA values represented by pd.NA. The only way to instantiate a StringArray with other missing value indicators(like np.nan and None) is to use pandas.array, which has a side effect of casting non-string elements to strings instead of erroring.

The proposed solution would allow StringArray instantiation from a numpy array containing np.nan/None without casting non-strings. This is useful if you want the StringArray constructor to validate that inputs are strings and also accepts other missing values other than pd.NA. At the very least, it should support np.nan since StringArray is created from a numpy array, and np.nan is the missing value indicator for numpy.

Describe the solution you'd like

Either accept nan-likes in the constructor directly(breaking change) or add a parameter to the constructor allowing other na_values, maybe something like the na_values parameter from read_csv.

API breaking implications

Either breaking change or new parameter.

Describe alternatives you've considered

You'd have to do the validation yourself and validating yourself and then having StringArray validate again is not good for perf.

cc @jorisvandenbossche

@lithomas1 lithomas1 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2021
@jbrockmendel jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Apr 12, 2021
@lithomas1 lithomas1 changed the title API/ENH: Accept np.nan/None in StringArray constructor API/ENH: Accept nan-likes in StringArray constructor May 10, 2021
@jreback jreback added this to the 1.3 milestone May 21, 2021
@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label May 21, 2021
@simonjayhawkins
Copy link
Member

removing milestone for now, can add back later

@lithomas1
Copy link
Member Author

#45168.

@lithomas1 lithomas1 added this to the 1.5 milestone Jan 17, 2022
@shortorian
Copy link

@lithomas1 created a pull request for this and it looks like it was very close to complete but I think it's now closed for inactivity. Is there a chance this issue will be reopened at some point? I could really use this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants