Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate downcast keyword for fillna #40988

Closed
phofl opened this issue Apr 16, 2021 · 16 comments · Fixed by #53671
Closed

DEPR: Deprecate downcast keyword for fillna #40988

phofl opened this issue Apr 16, 2021 · 16 comments · Fixed by #53671
Assignees
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@phofl
Copy link
Member

phofl commented Apr 16, 2021

This came up recently in #40861
Depending on the way forward this would allow us to get downcast out from Block or be stricter about it

Thougts?

cc @jbrockmendel @jreback

@phofl phofl added Dtype Conversions Unexpected or buggy dtype conversions Deprecate Functionality to remove in pandas labels Apr 16, 2021
@jbrockmendel
Copy link
Member

The more I look at this, the wonkier it seems. #40861 implemented a test case:

df = pd.DataFrame({"col1": [1, np.nan]})

df.fillna({"col1": 2}, downcast={"col1": "int64"})

But similar calls still raise:

>>> df.fillna({"col1": 2}, downcast={"col1": np.int64})
[...]
TypeError: <class 'numpy.int64'>

>>> df.fillna(2, downcast={"col1": "int64"})
[...]
AssertionError: dtypes as dict is not supported yet

>>> df.fillna(2, downcast="int64")
[...]
ValueError: downcast must have a dictionary or 'infer' as its argument

Moreover, there are several places in NDFrame.fillna and Block.fillna where we ignore the 'downcast' keyword so end up using None. There's also a couple of layers of special-casing in Block._maybe_downcast and Block.downcast.

Notwithstanding the places where the downcast keyword is ignored, the status quo is roughly:

if downcast is None:
    if self.dtype == object:
        return datetime_centric_special_case()
    elif self.dtype.kind in ["f", "m", "M"]:
        return no_op()
    elif self.ndim == 1:
        downcast  = "infer"
        return do_inference()
    else:
        return no_op()

else:
    if self.dtype == object:
        return no_op()
    elif self.ndim == 1:
        return one_case_where_we_actually_respect_downcast_keyword()
    elif downcast != "infer":
          raise ValueErrorOrAssertionError
     else:
        return do_inference()

Some of this we can improve by actually respecting the keyword in more places and call that a bugfix. But the rest of the special-casing (object dtype, ndim==1 vs ndim==2) will likely change things and need a deprecation.

AFAICT the useful cases for downcasting after fillna (and interpolate, the other places from which _maybe_downcast is called) are 1) float->int and 2) object->not_object.

If this were greenfield, my inclination would be to make downcast take a bool, with False meaning no downcasting and True meaning "infer". (we could also take a dict but it would get unpacked before making it to the Block methods)

@jbrockmendel
Copy link
Member

xref #11537

@jbrockmendel
Copy link
Member

More questionable behavior: 1) we do the downcast even if the fillna is a no-op, 2) we ignore inplace=True and downcast anyway.

@jbrockmendel
Copy link
Member

Based on a few days of trying to figure this out, along with #11537, I propose we deprecate the 'downcast' keyword in 'fillna', 'interpolate', 'ffill', 'bfill', with the future behavior being to always downcast for object dtype and never for anything else (the only relevant case for "anything else" being floats)

@jreback
Copy link
Contributor

jreback commented Dec 23, 2021

sounds ok to me

@jbrockmendel
Copy link
Member

Having some trouble with the "do X instead" part because we don't have a method for downcasting floats (do we?)

@jreback
Copy link
Contributor

jreback commented Dec 24, 2021

we don't downcast floats as a general rule

@jbrockmendel
Copy link
Member

update: deprecating the argument seems increasingly invasive. Leaning towards chipping away at the inconsistencies and calling them bugfixes.

@rhshadrach rhshadrach added Bug and removed Deprecate Functionality to remove in pandas labels Jun 25, 2022
@rhshadrach
Copy link
Member

@jbrockmendel - I removed the deprecate tag; do you know if this issue remain open for tracking or if it is now duplicative of other issues?

@jbrockmendel
Copy link
Member

I've completely lost track

@rhshadrach
Copy link
Member

rhshadrach commented Jul 18, 2022

For the four examples in the top of #40988 (comment), the first remains okay, the last is now fixed, and the other two still raise. A quick search for downcast and fillna gives some issues, but none of the nature here.

@jbrockmendel
Copy link
Member

@phofl what are your current thoughts here?

@phofl
Copy link
Member Author

phofl commented Jan 26, 2023

Ideally, we would get rid of downcast in fillna and others and provide a proper down casting method?

@jbrockmendel
Copy link
Member

Sounds good to me. Want to take point on this?

@phofl
Copy link
Member Author

phofl commented Jan 27, 2023

Yep, can take care of this.

Would add deprecation note and new method for 2.1, so that we can immediately point users to an alternative.

@jbrockmendel
Copy link
Member

I got annoyed enough by this to spend some more time on it. Thinking of deprecating in steps:

  1. Deprecate any "downcast" other than [None, False, "infer"] (or dict with each value being one of these)
  2. Deprecate behavior of downcasting that casts round-floats to ints (once enforced this would allow us to reuse Block.convert instead of Block._downcast_2d)
  3. Deprecate downcast keyword altogether

@rhshadrach rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Deprecate Functionality to remove in pandas and removed Bug labels Jun 29, 2023
@rhshadrach rhshadrach added this to the 2.1 milestone Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants