API/DEPR: int downcasting in DataFrame.where



`Block.where` has special downcasting logic that splits blocks differently from any other Block methods.  I would like to deprecate and eventually remove this bespoke logic.

The relevant logic is only reached AFAICT when we have integer dtype (non-int64) and an integer `other` too big for this dtype, AND the passed `cond` has all-`True` columns.

(Identifying the affected behavior is difficult in part because it relies on `can_hold_element` incorrectly returning `True` in these cases)

```
import numpy as np
import pandas as pd

arr = np.arange(6).astype(np.int16).reshape(3, 2)
df = pd.DataFrame(arr)

mask = np.zeros(arr.shape, dtype=bool)
mask[:, 0] = True

res = df.where(mask, 2**17)

>>> res.dtypes
0    int16
1    int32
dtype: object
```

The simplest thing to do would be to not do any downcasting in these cases, in which case we would end up with all-int32.  The next simplest would be to downcast column-wise, which would give the same end result but with less consolidation.

We do not have any test cases that fail if I disable this downcasting (after I fix a problem with an expressions.where call that the downcasting somehow makes irrelevant).  This makes me think the current behavior is not intentional, or at least not a priority.

Any objection to deprecating the integer downcasting entirely?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API/DEPR: int downcasting in DataFrame.where #44597

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API/DEPR: int downcasting in DataFrame.where #44597

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions