BUG: Can't cast pyarrow floats to ints #56673

mattharrison · 2023-12-29T00:07:04Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

This fails with pyarrow types but works with legacy types:

>>> pd.Series([2.3, 2.5], dtype='float64[pyarrow]').astype('int64[pyarrow]')
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
Cell In[100], line 1
----> 1 pd.Series([2.3, 2.5], dtype='float64[pyarrow]').astype('int64[pyarrow]')

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/generic.py:6637, in NDFrame.astype(self, dtype, copy, errors)
   6631     results = [
   6632         ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
   6633     ]
   6635 else:
   6636     # else, only a single dtype is given
-> 6637     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6638     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6639     return res.__finalize__(self, method="astype")

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/internals/managers.py:431, in BaseBlockManager.astype(self, dtype, copy, errors)
    428 elif using_copy_on_write():
    429     copy = False
--> 431 return self.apply(
    432     "astype",
    433     dtype=dtype,
    434     copy=copy,
    435     errors=errors,
    436     using_cow=using_copy_on_write(),
    437 )

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/internals/managers.py:364, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    362         applied = b.apply(f, **kwargs)
    363     else:
--> 364         applied = getattr(b, f)(**kwargs)
    365     result_blocks = extend_blocks(applied, result_blocks)
    367 out = type(self).from_blocks(result_blocks, self.axes)

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/internals/blocks.py:754, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
    751         raise ValueError("Can not squeeze with more than one column.")
    752     values = values[0, :]  # type: ignore[call-overload]
--> 754 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    756 new_values = maybe_coerce_values(new_values)
    758 refs = None

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors)
    234     dtype = dtype.numpy_dtype
    236 try:
--> 237     new_values = astype_array(values, dtype, copy=copy)
    238 except (ValueError, TypeError):
    239     # e.g. _astype_nansafe can fail on object-dtype of strings
    240     #  trying to convert to float
    241     if errors == "ignore":

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:179, in astype_array(values, dtype, copy)
    175     return values
    177 if not isinstance(values, np.ndarray):
    178     # i.e. ExtensionArray
--> 179     values = values.astype(dtype, copy=copy)
    181 else:
    182     values = _astype_nansafe(values, dtype, copy=copy)

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/arrays/base.py:709, in ExtensionArray.astype(self, dtype, copy)
    707 if isinstance(dtype, ExtensionDtype):
    708     cls = dtype.construct_array_type()
--> 709     return cls._from_sequence(self, dtype=dtype, copy=copy)
    711 elif lib.is_np_dtype(dtype, "M"):
    712     from pandas.core.arrays import DatetimeArray

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/arrays/arrow/array.py:281, in ArrowExtensionArray._from_sequence(cls, scalars, dtype, copy)
    277 """
    278 Construct a new ExtensionArray from a sequence of scalars.
    279 """
    280 pa_type = to_pyarrow_type(dtype)
--> 281 pa_array = cls._box_pa_array(scalars, pa_type=pa_type, copy=copy)
    282 arr = cls(pa_array)
    283 return arr

File ~/.envs/pd22rc/lib/python3.11/site-packages/pandas/core/arrays/arrow/array.py:497, in ArrowExtensionArray._box_pa_array(cls, value, pa_type, copy)
    495 else:
    496     try:
--> 497         pa_array = pa_array.cast(pa_type)
    498     except (
    499         pa.ArrowInvalid,
    500         pa.ArrowTypeError,
    501         pa.ArrowNotImplementedError,
    502     ):
    503         if pa.types.is_string(pa_array.type) or pa.types.is_large_string(
    504             pa_array.type
    505         ):
    506             # TODO: Move logic in _from_sequence_of_strings into
    507             # _box_pa_array

File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/table.pxi:565, in pyarrow.lib.ChunkedArray.cast()

File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/compute.py:404, in cast(arr, target_type, safe, options, memory_pool)
    402     else:
    403         options = CastOptions.safe(target_type)
--> 404 return call_function("cast", [arr], options, memory_pool)

File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/_compute.pyx:590, in pyarrow._compute.call_function()

File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/_compute.pyx:385, in pyarrow._compute.Function.call()

File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File ~/.envs/pd22rc/lib/python3.11/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()

ArrowInvalid: Float value 2.3 was truncated converting to int64

Issue Description

Converting pyarrow floats to integers requires passing through legacy types

Expected Behavior

I would expect the same behavior with legacy pandas types

Installed Versions

commit : d4c8d82
python : 3.11.6.final.0
python-bits : 64
OS : Darwin
OS-release : 23.2.0
Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : en_US.UTF-8
LANG : None
LOCALE : en_US.UTF-8

pandas : 2.2.0rc0
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.7
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.4
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.19.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

lithomas1 · 2024-03-22T04:19:19Z

Hm, I think Arrow casts are safe by default.

Maybe this can be solved by adding a safe keyword to astype.

I'll also add that there's an existing issue for turning on "safe" casting by default - even for numpy,
#45588.

mattharrison added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 29, 2023

lithomas1 added API - Consistency Internal Consistency of API/Behavior Arrow pyarrow functionality Astype and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Can't cast pyarrow floats to ints #56673

BUG: Can't cast pyarrow floats to ints #56673

mattharrison commented Dec 29, 2023 •

edited

Loading

lithomas1 commented Mar 22, 2024

Uh oh!

Uh oh!

BUG: Can't cast pyarrow floats to ints #56673

BUG: Can't cast pyarrow floats to ints #56673

Comments

mattharrison commented Dec 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

lithomas1 commented Mar 22, 2024

Uh oh!

mattharrison commented Dec 29, 2023 •

edited

Loading