API: dtype_backend and constructors long-term #51846

jbrockmendel · 2023-03-08T20:16:16Z

On today's dev call we discussed the dtype_backend global option (decided to revert for 2.0) and the use_nullable_dtypes keyword in IO methods (decided to change to dtype_backend, except where it already exists in 1.5 where it will be deprecated in favor of dtype_backend).

Some of the reasoning centered around the fact that constructors do not currently recognize the dtype_backend option, and there was an offhand reference to adding that as a keyword to the constructors. I think we should avoid adding keywords/options when there are viable alternatives. Going one step further: long term I think we need neither a dtype_backend option nor keyword anywhere.

For IO functions, the "engine" keyword (where applicable) should determine what kind of dtypes you get back. The most relevant case is engine="pyarrow". If you want something else, you can use obj.convert_dtypes(...).

For constructors, the "dtype" keyword should be sufficient in most cases. The two cases where it is not are

a) dtype=None
The natural thing to do is infer based on the data. If the data is already a pandas object we retain the dtype. If it is a numpy object we use a numpy dtype. If it is a pyarrow object we use pd.ArrowDtype. That leaves cases where it is e.g. a list. For that we could plausibly use a global option, but it'd be simpler to just have a sensible default and tell users to use convert_dtypes (or pass a keyword) if desired.

b) dtype=int|"int"|"int64"|... where we could plausibly default to either np.int64 or pa.int64. As above, could have a global option but better to just have a sensible default and tell users to be more specific or use convert_dtypes.

The text was updated successfully, but these errors were encountered:

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 8, 2023

ivirshup mentioned this issue Mar 8, 2023

Support for arrow arrays in pandas 2.0 scverse/anndata#948

Open

This was referenced Mar 10, 2023

BUG: ArrowCSVParser does not support dtype_backend="numpy_nullable" #51852

Closed

BUG (2.0rc0): groupby apply with a UDF changes dtype unexpectedly from double[pyarrow] to float64 #51991

Open

lithomas1 added API Design Arrow pyarrow functionality and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 8, 2023

jbrockmendel mentioned this issue Jun 28, 2023

Add SQL Support for ADBC Drivers #53869

Merged

phofl mentioned this issue Aug 7, 2023

ENH: allow opt-in to inferring pyarrow strings #54430

Merged

5 tasks

lithomas1 mentioned this issue Aug 12, 2023

BUG: DataFrame.to_parquet does not round-trip index dtype #54000

Open

3 tasks

jbrockmendel mentioned this issue Apr 29, 2024

PDEP-13: The pandas Logical Type System #58455

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: dtype_backend and constructors long-term #51846

API: dtype_backend and constructors long-term #51846

jbrockmendel commented Mar 8, 2023 •

edited

Loading

API: dtype_backend and constructors long-term #51846

API: dtype_backend and constructors long-term #51846

Comments

jbrockmendel commented Mar 8, 2023 • edited Loading

jbrockmendel commented Mar 8, 2023 •

edited

Loading