Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: dtype_backend and constructors long-term #51846

Open
jbrockmendel opened this issue Mar 8, 2023 · 0 comments
Open

API: dtype_backend and constructors long-term #51846

jbrockmendel opened this issue Mar 8, 2023 · 0 comments
Labels
API Design Arrow pyarrow functionality

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Mar 8, 2023

On today's dev call we discussed the dtype_backend global option (decided to revert for 2.0) and the use_nullable_dtypes keyword in IO methods (decided to change to dtype_backend, except where it already exists in 1.5 where it will be deprecated in favor of dtype_backend).

Some of the reasoning centered around the fact that constructors do not currently recognize the dtype_backend option, and there was an offhand reference to adding that as a keyword to the constructors. I think we should avoid adding keywords/options when there are viable alternatives. Going one step further: long term I think we need neither a dtype_backend option nor keyword anywhere.

For IO functions, the "engine" keyword (where applicable) should determine what kind of dtypes you get back. The most relevant case is engine="pyarrow". If you want something else, you can use obj.convert_dtypes(...).

For constructors, the "dtype" keyword should be sufficient in most cases. The two cases where it is not are

a) dtype=None
The natural thing to do is infer based on the data. If the data is already a pandas object we retain the dtype. If it is a numpy object we use a numpy dtype. If it is a pyarrow object we use pd.ArrowDtype. That leaves cases where it is e.g. a list. For that we could plausibly use a global option, but it'd be simpler to just have a sensible default and tell users to use convert_dtypes (or pass a keyword) if desired.

b) dtype=int|"int"|"int64"|... where we could plausibly default to either np.int64 or pa.int64. As above, could have a global option but better to just have a sensible default and tell users to be more specific or use convert_dtypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Arrow pyarrow functionality
Projects
None yet
Development

No branches or pull requests

2 participants