We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
This fails with pyarrow types but works with legacy types:
>>> pd.Series([2.3, 2.5], dtype='float64[pyarrow]').astype('int64[pyarrow]') --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) Cell In[100], line 1 ----> 1 pd.Series([2.3, 2.5], dtype='float64[pyarrow]').astype('int64[pyarrow]') File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/generic.py:6637, in NDFrame.astype(self, dtype, copy, errors) 6631 results = [ 6632 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items() 6633 ] 6635 else: 6636 # else, only a single dtype is given -> 6637 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 6638 res = self._constructor_from_mgr(new_data, axes=new_data.axes) 6639 return res.__finalize__(self, method="astype") File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/internals/managers.py:431, in BaseBlockManager.astype(self, dtype, copy, errors) 428 elif using_copy_on_write(): 429 copy = False --> 431 return self.apply( 432 "astype", 433 dtype=dtype, 434 copy=copy, 435 errors=errors, 436 using_cow=using_copy_on_write(), 437 ) File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/internals/managers.py:364, in BaseBlockManager.apply(self, f, align_keys, **kwargs) 362 applied = b.apply(f, **kwargs) 363 else: --> 364 applied = getattr(b, f)(**kwargs) 365 result_blocks = extend_blocks(applied, result_blocks) 367 out = type(self).from_blocks(result_blocks, self.axes) File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/internals/blocks.py:754, in Block.astype(self, dtype, copy, errors, using_cow, squeeze) 751 raise ValueError("Can not squeeze with more than one column.") 752 values = values[0, :] # type: ignore[call-overload] --> 754 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 756 new_values = maybe_coerce_values(new_values) 758 refs = None File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors) 234 dtype = dtype.numpy_dtype 236 try: --> 237 new_values = astype_array(values, dtype, copy=copy) 238 except (ValueError, TypeError): 239 # e.g. _astype_nansafe can fail on object-dtype of strings 240 # trying to convert to float 241 if errors == "ignore": File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:179, in astype_array(values, dtype, copy) 175 return values 177 if not isinstance(values, np.ndarray): 178 # i.e. ExtensionArray --> 179 values = values.astype(dtype, copy=copy) 181 else: 182 values = _astype_nansafe(values, dtype, copy=copy) File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/arrays/base.py:709, in ExtensionArray.astype(self, dtype, copy) 707 if isinstance(dtype, ExtensionDtype): 708 cls = dtype.construct_array_type() --> 709 return cls._from_sequence(self, dtype=dtype, copy=copy) 711 elif lib.is_np_dtype(dtype, "M"): 712 from pandas.core.arrays import DatetimeArray File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/arrays/arrow/array.py:281, in ArrowExtensionArray._from_sequence(cls, scalars, dtype, copy) 277 """ 278 Construct a new ExtensionArray from a sequence of scalars. 279 """ 280 pa_type = to_pyarrow_type(dtype) --> 281 pa_array = cls._box_pa_array(scalars, pa_type=pa_type, copy=copy) 282 arr = cls(pa_array) 283 return arr File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/arrays/arrow/array.py:497, in ArrowExtensionArray._box_pa_array(cls, value, pa_type, copy) 495 else: 496 try: --> 497 pa_array = pa_array.cast(pa_type) 498 except ( 499 pa.ArrowInvalid, 500 pa.ArrowTypeError, 501 pa.ArrowNotImplementedError, 502 ): 503 if pa.types.is_string(pa_array.type) or pa.types.is_large_string( 504 pa_array.type 505 ): 506 # TODO: Move logic in _from_sequence_of_strings into 507 # _box_pa_array File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/table.pxi:565, in pyarrow.lib.ChunkedArray.cast() File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/compute.py:404, in cast(arr, target_type, safe, options, memory_pool) 402 else: 403 options = CastOptions.safe(target_type) --> 404 return call_function("cast", [arr], options, memory_pool) File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/_compute.pyx:590, in pyarrow._compute.call_function() File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/_compute.pyx:385, in pyarrow._compute.Function.call() File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status() File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status() ArrowInvalid: Float value 2.3 was truncated converting to int64
Converting pyarrow floats to integers requires passing through legacy types
I would expect the same behavior with legacy pandas types
commit : d4c8d82 python : 3.11.6.final.0 python-bits : 64 OS : Darwin OS-release : 23.2.0 Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : en_US.UTF-8 LANG : None LOCALE : en_US.UTF-8
pandas : 2.2.0rc0 numpy : 1.26.2 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : 3.0.7 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.4 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.19.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.2 numba : 0.58.1 numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 14.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
The text was updated successfully, but these errors were encountered:
Hm, I think Arrow casts are safe by default.
Maybe this can be solved by adding a safe keyword to astype.
safe
I'll also add that there's an existing issue for turning on "safe" casting by default - even for numpy, #45588.
Sorry, something went wrong.
No branches or pull requests
Uh oh!
There was an error while loading. Please reload this page.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
This fails with pyarrow types but works with legacy types:
Issue Description
Converting pyarrow floats to integers requires passing through legacy types
Expected Behavior
I would expect the same behavior with legacy pandas types
Installed Versions
commit : d4c8d82
python : 3.11.6.final.0
python-bits : 64
OS : Darwin
OS-release : 23.2.0
Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : en_US.UTF-8
LANG : None
LOCALE : en_US.UTF-8
pandas : 2.2.0rc0
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.7
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.4
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.19.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: