Skip to content

TYP overload fillna #40737 #40887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 15, 2021
Merged

TYP overload fillna #40737 #40887

merged 15 commits into from
Apr 15, 2021

Conversation

LarWong
Copy link
Contributor

@LarWong LarWong commented Apr 11, 2021

@LarWong LarWong marked this pull request as draft April 11, 2021 23:57
@LarWong LarWong marked this pull request as ready for review April 12, 2021 19:38
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need all these overloads - you just need:

  • one for Literal[False]=...
  • one for bool=...
  • one for each combination of default arguments which precede inplace

that should work out to 10 in total

@LarWong
Copy link
Contributor Author

LarWong commented Apr 12, 2021

@MarcoGorelli Oh ok, I was under the impression that I needed an overload for each combination. I'll change it.

@LarWong LarWong marked this pull request as draft April 12, 2021 20:32
@MarcoGorelli
Copy link
Member

Nice! And great that this makes a couple of casts redundant

Could include the output from running something like what was done in #40860 (comment) ? I think you might need to always include =... for default arguments which come after inplace for it to work, but we'll see

@LarWong
Copy link
Contributor Author

LarWong commented Apr 12, 2021

Sure, I'll post it soon

@LarWong
Copy link
Contributor Author

LarWong commented Apr 12, 2021

@MarcoGorelli Yes, you were correct. =... was needed for it to work

@LarWong
Copy link
Contributor Author

LarWong commented Apr 12, 2021

Here are the inputs to and the outputs of mypy:

# DataFrame_t.py
import pandas as pd

inplace : bool

reveal_type(pd.DataFrame([1,2,3]).fillna(inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(method='pad', inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(axis=0, inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, method='pad', inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, axis=0, inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(method='pad', axis=0, inplace=False))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, method='pad', axis=0, inplace=False))

reveal_type(pd.DataFrame([1,2,3]).fillna(inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(method='pad', inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(axis=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, method='pad', inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, axis=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(method='pad', axis=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, method='pad', axis=0, inplace=True))

reveal_type(pd.DataFrame([1,2,3]).fillna(inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(method='pad', inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(axis=0, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, method='pad', inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, axis=0, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(method='pad', axis=0, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).fillna(value=0, method='pad', axis=0, inplace=inplace))

Output:

DataFrame_t.py:5: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:6: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:7: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:8: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:9: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:10: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:11: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:12: note: Revealed type is 'pandas.core.frame.DataFrame'
DataFrame_t.py:14: note: Revealed type is 'None'
DataFrame_t.py:15: note: Revealed type is 'None'
DataFrame_t.py:16: note: Revealed type is 'None'
DataFrame_t.py:17: note: Revealed type is 'None'
DataFrame_t.py:18: note: Revealed type is 'None'
DataFrame_t.py:19: note: Revealed type is 'None'
DataFrame_t.py:20: note: Revealed type is 'None'
DataFrame_t.py:21: note: Revealed type is 'None'
DataFrame_t.py:23: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:24: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:25: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:26: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:27: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:28: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:29: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
DataFrame_t.py:30: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'

# Series_t.py
import pandas as pd

inplace : bool

reveal_type(pd.Series([1,2,3]).fillna(inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(value=0, inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(method='pad', inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(axis=0, inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(value=0, method='pad', inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(value=0, axis=0, inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(method='pad', axis=0, inplace=False))
reveal_type(pd.Series([1,2,3]).fillna(value=0, method='pad', axis=0, inplace=False))

reveal_type(pd.Series([1,2,3]).fillna(inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(value=0, inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(method='pad', inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(axis=0, inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(value=0, method='pad', inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(value=0, axis=0, inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(method='pad', axis=0, inplace=True))
reveal_type(pd.Series([1,2,3]).fillna(value=0, method='pad', axis=0, inplace=True))

reveal_type(pd.Series([1,2,3]).fillna(inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(value=0, inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(method='pad', inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(axis=0, inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(value=0, method='pad', inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(value=0, axis=0, inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(method='pad', axis=0, inplace=inplace))
reveal_type(pd.Series([1,2,3]).fillna(value=0, method='pad', axis=0, inplace=inplace))

Output:

Series_t.py:5: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:6: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:7: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:8: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:9: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:10: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:11: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:12: note: Revealed type is 'pandas.core.series.Series'
Series_t.py:14: note: Revealed type is 'None'
Series_t.py:15: note: Revealed type is 'None'
Series_t.py:16: note: Revealed type is 'None'
Series_t.py:17: note: Revealed type is 'None'
Series_t.py:18: note: Revealed type is 'None'
Series_t.py:19: note: Revealed type is 'None'
Series_t.py:20: note: Revealed type is 'None'
Series_t.py:21: note: Revealed type is 'None'
Series_t.py:23: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:24: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:25: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:26: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:27: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:28: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:29: note: Revealed type is 'Union[pandas.core.series.Series, None]'
Series_t.py:30: note: Revealed type is 'Union[pandas.core.series.Series, None]'

@LarWong LarWong marked this pull request as ready for review April 12, 2021 22:59
@MarcoGorelli MarcoGorelli self-requested a review April 13, 2021 09:33
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LarWong ! Looks good, I just have a question about:

value: Scalar | dict | Series | DataFrame | None

Where did you get this from? Was it from inspection? cc @simonjayhawkins is this correct?

@overload
def fillna(
self,
value: Scalar | dict | Series | DataFrame | None = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've typed value in the overloads, perhaps let's type it in the function signature too? (line 5128)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can modify it, but let's wait until someone has confirmed the typing of value first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LarWong perhaps let's revert typing value here, that can be done separately (and we'll probably need to be more precise than dict), the rest looks good

Copy link
Contributor Author

@LarWong LarWong Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli Sorry, but I'm not sure what you mean by revert typing. Do you mean:

    @overload
    def fillna(
        self,
        value,

As in not typing value in the overloads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli Ok done! The outputs from mypy are the same after modifications (see above).

@jreback jreback added the Typing type annotations, mypy/pyright type checking label Apr 13, 2021
@overload
def fillna(
self,
value: Scalar | dict | Series | DataFrame | None = ...,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make an alias for this

@@ -5007,6 +5007,121 @@ def rename(
errors=errors,
)

@overload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not put these in .pyi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told to add them directly to these files since existing overloads were already there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback yes, that would be great! Thing is though, the .pyi files require you to define all methods of a module

Given the sheer number of methods this module has, I'd suggest taking this PR with the overloads here, and then moving the overloads (along with annotations for all other methods) to a pandas/core/frame.pyi file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LarWong I'll get back to you on typing value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not put these in .pyi?

I don't think we want to go the route of using stubs for python files

@jreback yes, that would be great! Thing is though, the .pyi files require you to define all methods of a module

not that I think it should be done here, but it is possible to partially type a module using stubs.

https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#incomplete-stubs

Partial modules (i.e. modules that are missing some or all classes, functions, or attributes) must include a top-level getattr() function marked with an # incomplete comment (see example below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simonjayhawkins , didn't know that was possible. Why not use partial stubs for overloads? Because some methods, like Series.drop

pandas/pandas/core/series.py

Lines 4478 to 4487 in 84d9c5e

def drop(
self,
labels=None,
axis=0,
index=None,
columns=None,
level=None,
inplace=False,
errors="raise",
) -> Series:

have 5 (five!) arguments with defaults before inplace, leading to...34 overloads 🤯 ! And even if we disallowed labels being passed as None in that one, that would still leave us with 18 overloads!


For typing value here, do you think Scalar | Mapping[Hashable, Scalar] | Series | DataFrame | None would be correct?

Copy link
Member

@simonjayhawkins simonjayhawkins Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that if a stub file is present it takes precedence over the python file. so we cannot ensure internal consistency and need to duplicate the type annotations to be able to check the functions in the module itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have 5 (five!) arguments with defaults before inplace, leading to...34 overloads

I think we are planning to to drop the inplace argument, hopefully pandas 2.0 and we won't need all these overloads. #16529

@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Apr 13, 2021
@LarWong
Copy link
Contributor Author

LarWong commented Apr 13, 2021

Thanks @LarWong ! Looks good, I just have a question about:

value: Scalar | dict | Series | DataFrame | None

Where did you get this from? Was it from inspection? cc @simonjayhawkins is this correct?

@MarcoGorelli For value, those were the types suggested by the documentation.

@MarcoGorelli MarcoGorelli self-requested a review April 15, 2021 17:06
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LarWong !


Will try to either move overloads to .pyi files and have some way of keeping them in sync, or turning off the black formatter and putting some noqas in, tomorrow

@MarcoGorelli MarcoGorelli added this to the 1.3 milestone Apr 15, 2021
@MarcoGorelli MarcoGorelli merged commit 8fd8c8b into pandas-dev:master Apr 15, 2021
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request Apr 21, 2021
* TYP: Added overloads for fillna() in frame.py and series.py

* TYP: Added overloads for fillna() in frame.py and series.py pandas-dev#40737

* TYP: Added fillna() overloads to generic.py pandas-dev#40727

* TYP: removed generic overloads pandas-dev#40737

* fixed redundant cast error

* reverting prior changes

* remove cast again

* removed unnecessary overloads in frame.py and series.py

* fixed overloads

* reverted value typing

* remove extra types (lets keep this to overloads)

Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021
* TYP: Added overloads for fillna() in frame.py and series.py

* TYP: Added overloads for fillna() in frame.py and series.py pandas-dev#40737

* TYP: Added fillna() overloads to generic.py pandas-dev#40727

* TYP: removed generic overloads pandas-dev#40737

* fixed redundant cast error

* reverting prior changes

* remove cast again

* removed unnecessary overloads in frame.py and series.py

* fixed overloads

* reverted value typing

* remove extra types (lets keep this to overloads)

Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
* TYP: Added overloads for fillna() in frame.py and series.py

* TYP: Added overloads for fillna() in frame.py and series.py pandas-dev#40737

* TYP: Added fillna() overloads to generic.py pandas-dev#40727

* TYP: removed generic overloads pandas-dev#40737

* fixed redundant cast error

* reverting prior changes

* remove cast again

* removed unnecessary overloads in frame.py and series.py

* fixed overloads

* reverted value typing

* remove extra types (lets keep this to overloads)

Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TYP overload fillna
4 participants