You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On today's dev call we discussed the dtype_backend global option (decided to revert for 2.0) and the use_nullable_dtypes keyword in IO methods (decided to change to dtype_backend, except where it already exists in 1.5 where it will be deprecated in favor of dtype_backend).
Some of the reasoning centered around the fact that constructors do not currently recognize the dtype_backend option, and there was an offhand reference to adding that as a keyword to the constructors. I think we should avoid adding keywords/options when there are viable alternatives. Going one step further: long term I think we need neither a dtype_backend option nor keyword anywhere.
For IO functions, the "engine" keyword (where applicable) should determine what kind of dtypes you get back. The most relevant case is engine="pyarrow". If you want something else, you can use obj.convert_dtypes(...).
For constructors, the "dtype" keyword should be sufficient in most cases. The two cases where it is not are
a) dtype=None
The natural thing to do is infer based on the data. If the data is already a pandas object we retain the dtype. If it is a numpy object we use a numpy dtype. If it is a pyarrow object we use pd.ArrowDtype. That leaves cases where it is e.g. a list. For that we could plausibly use a global option, but it'd be simpler to just have a sensible default and tell users to use convert_dtypes (or pass a keyword) if desired.
b) dtype=int|"int"|"int64"|... where we could plausibly default to either np.int64 or pa.int64. As above, could have a global option but better to just have a sensible default and tell users to be more specific or use convert_dtypes.
The text was updated successfully, but these errors were encountered:
On today's dev call we discussed the dtype_backend global option (decided to revert for 2.0) and the use_nullable_dtypes keyword in IO methods (decided to change to dtype_backend, except where it already exists in 1.5 where it will be deprecated in favor of dtype_backend).
Some of the reasoning centered around the fact that constructors do not currently recognize the dtype_backend option, and there was an offhand reference to adding that as a keyword to the constructors. I think we should avoid adding keywords/options when there are viable alternatives. Going one step further: long term I think we need neither a dtype_backend option nor keyword anywhere.
For IO functions, the "engine" keyword (where applicable) should determine what kind of dtypes you get back. The most relevant case is
engine="pyarrow"
. If you want something else, you can useobj.convert_dtypes(...)
.For constructors, the "dtype" keyword should be sufficient in most cases. The two cases where it is not are
a) dtype=None
The natural thing to do is infer based on the data. If the data is already a pandas object we retain the dtype. If it is a numpy object we use a numpy dtype. If it is a pyarrow object we use pd.ArrowDtype. That leaves cases where it is e.g. a list. For that we could plausibly use a global option, but it'd be simpler to just have a sensible default and tell users to use convert_dtypes (or pass a keyword) if desired.
b) dtype=int|"int"|"int64"|... where we could plausibly default to either np.int64 or pa.int64. As above, could have a global option but better to just have a sensible default and tell users to be more specific or use convert_dtypes.
The text was updated successfully, but these errors were encountered: