Skip to content

CLN/TST: delegate StringArray.fillna() to parent class + add tests #37987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Nov 26, 2020

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Nov 21, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@arw2019 arw2019 self-assigned this Nov 21, 2020
@arw2019 arw2019 added ExtensionArray Extending pandas with custom dtypes or arrays. Strings String extension data type and string data Error Reporting Incorrect or improved errors from pandas and removed ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 21, 2020
@@ -283,7 +283,8 @@ def __setitem__(self, key, value):
super().__setitem__(key, value)

def fillna(self, value=None, method=None, limit=None):
# TODO: validate dtype
if not isinstance(value, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.str_?

its a weird usage, but there's no reason why a user couldn't pass a n NA fill value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added both, including the missing value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.str_ is already be covered by str (it subclasses str):

In [13]: arr = np.array(["a"])

In [14]: type(arr[0])
Out[14]: numpy.str_

In [15]: isinstance(arr[0], str)
Out[15]: True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Got rid of the np.str_ check

@arw2019 arw2019 force-pushed the string-validate-fillna-args branch from 0a85704 to 34e3d82 Compare November 21, 2020 05:48
@arw2019 arw2019 force-pushed the string-validate-fillna-args branch from c5da53a to 616a6c7 Compare November 21, 2020 05:50
@arw2019
Copy link
Member Author

arw2019 commented Nov 21, 2020

Does this need a whatsnew?

@jorisvandenbossche
Copy link
Member

Does this need a whatsnew?

I don't think it is necessarily needed, it's only a change in the error message I think?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -283,7 +283,10 @@ def __setitem__(self, key, value):
super().__setitem__(key, value)

def fillna(self, value=None, method=None, limit=None):
# TODO: validate dtype
if value is not None and not (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should just be able to call _validate_setitem_value(value), maybe we can it it in the baseclass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like this since it would allow us to implement a correct ExtensionBlock._can_hold_element, xref #36226

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should just be able to call _validate_setitem_value(value), maybe we can it it in the baseclass.

Gave that a go - is the last commit what you meant? @jreback @jbrockmendel

Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am -1 adding it in this PR (or at least adding it on the base class). If we add it to the base class, it needs to be part of the EA interface, documented that way, have a base extension test, have a fall back implementation, .. (and first discuss if we actually want it).

Also, we actually don't need to explicitly call the setitem validation here, since fillna is already raising that exception because it uses the setitem implementation under the hood.

So we could also simply add the test to ensure fillna raises the proper error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am -1 adding it in this PR (or at least adding it on the base class). If we add it to the base class, it needs to be part of the EA interface, documented that way, have a base extension test, have a fall back implementation, .. (and first discuss if we actually want it).

Ok! I'll open an issue

Also, we actually don't need to explicitly call the setitem validation here, since fillna is already raising that exception because it uses the setitem implementation under the hood.

So we could also simply add the test to ensure fillna raises the proper error.

There was an existing test and I added more testcases here


super().__setitem__(key, value)

def fillna(self, value=None, method=None, limit=None):
# TODO: validate dtype
if value is not None:
value = self._validate_setitem_value(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not needed, as the setitem under the hood of fillna already calls it. So calling it here as well means we validate the value twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, reverted all the changes here

@arw2019 arw2019 closed this Nov 25, 2020
@arw2019 arw2019 reopened this Nov 25, 2020
@arw2019
Copy link
Member Author

arw2019 commented Nov 25, 2020

I pared down this PR to just the fillna test for strings. Discussion on _validate_setitem_value is housed in #36226 (happy to work on that in a follow-on unless somebody else wants to)

@jorisvandenbossche jorisvandenbossche changed the title ENH: validate StringArray fillna value arg TST: validate StringArray fillna value arg Nov 25, 2020
@@ -283,7 +283,6 @@ def __setitem__(self, key, value):
super().__setitem__(key, value)

def fillna(self, value=None, method=None, limit=None):
# TODO: validate dtype
return super().fillna(value, method, limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this method need to be overriden here at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed that now

@arw2019 arw2019 changed the title TST: validate StringArray fillna value arg CLN/TST: delegate StringArray.fillna() to parent class + add tests Nov 25, 2020
@jreback jreback added this to the 1.2 milestone Nov 26, 2020
@jreback jreback merged commit 94179cd into pandas-dev:master Nov 26, 2020
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

thanks @arw2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants