Skip to content

Commit 676b773

Browse files
committed
API: This fixes a number of inconsistencies and API issues
w.r.t. dtype conversions. This is a reprise of pandas-dev#14145 & pandas-dev#16408. This removes some code from the core structures & pushes it to internals, where the primitives are made more consistent. This should all us to be a bit more consistent for pandas2 type things. closes pandas-dev#16402 supersedes pandas-dev#14145 closes pandas-dev#14001 CLN: remove uneeded code in internals; use split_and_operate when possible
1 parent f19966e commit 676b773

23 files changed

+841
-609
lines changed

doc/source/whatsnew/v0.21.0.txt

+60-8
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,65 @@ the target. Now, a ``ValueError`` will be raised when such an input is passed in
127127
...
128128
ValueError: Cannot operate inplace if there is no assignment
129129

130+
.. _whatsnew_0210.dtype_conversions:
131+
132+
Dtype Conversions
133+
^^^^^^^^^^^^^^^^^
134+
135+
- Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignment, would coerce to
136+
same type (e.g. int / float), or raise for datetimelikes. These will now preseve the bools with ``object`` dtypes. (:issue:`16821`).
137+
138+
.. ipython:: python
139+
140+
s = Series([1, 2, 3])
141+
142+
.. code-block:: python
143+
144+
In [5]: s[1] = True
145+
146+
In [6]: s
147+
Out[6]:
148+
0 1
149+
1 1
150+
2 3
151+
dtype: int64
152+
153+
New Behavior
154+
155+
.. ipython:: python
156+
157+
s[1] = True
158+
s
159+
160+
- Previously as assignment to a datetimelike with a non-datetimelike would coerce the
161+
non-datetime-like item being assigned (:issue:`14145`).
162+
163+
.. ipython:: python
164+
165+
s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])
166+
167+
.. code-block:: python
168+
169+
In [1]: s[1] = 1
170+
171+
In [2]: s
172+
Out[2]:
173+
0 2011-01-01 00:00:00.000000000
174+
1 1970-01-01 00:00:00.000000001
175+
dtype: datetime64[ns]
176+
177+
These now coerce to ``object`` dtype.
178+
179+
.. ipython:: python
180+
181+
s[1] = 1
182+
s
183+
184+
- Additional bug fixes w.r.t. dtype conversions.
185+
186+
- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
187+
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
188+
130189
.. _whatsnew_0210.api:
131190

132191
Other API Changes
@@ -142,13 +201,6 @@ Other API Changes
142201
- Compression defaults in HDF stores now follow pytable standards. Default is no compression and if ``complib`` is missing and ``complevel`` > 0 ``zlib`` is used (:issue:`15943`)
143202
- ``Index.get_indexer_non_unique()`` now returns a ndarray indexer rather than an ``Index``; this is consistent with ``Index.get_indexer()`` (:issue:`16819`)
144203
- Removed the ``@slow`` decorator from ``pandas.util.testing``, which caused issues for some downstream packages' test suites. Use ``@pytest.mark.slow`` instead, which achieves the same thing (:issue:`16850`)
145-
146-
147-
.. _whatsnew_0210.api:
148-
149-
Other API Changes
150-
^^^^^^^^^^^^^^^^^
151-
152204
- Moved definition of ``MergeError`` to the ``pandas.errors`` module.
153205

154206

@@ -192,7 +244,7 @@ Bug Fixes
192244
Conversion
193245
^^^^^^^^^^
194246

195-
- Bug in assignment against datetime-like data with ``int`` may incorrectly converted to datetime-like (:issue:`14145`)
247+
- Bug in assignment against datetime-like data with ``int`` may incorrectly converte to datetime-like (:issue:`14145`)
196248
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
197249

198250

pandas/_libs/index.pyx

+20-6
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ cimport tslib
1919
from hashtable cimport *
2020
from pandas._libs import tslib, algos, hashtable as _hash
2121
from pandas._libs.tslib import Timestamp, Timedelta
22+
from datetime import datetime, timedelta
2223

2324
from datetime cimport (get_datetime64_value, _pydatetime_to_dts,
2425
pandas_datetimestruct)
@@ -507,24 +508,37 @@ cdef class TimedeltaEngine(DatetimeEngine):
507508
return 'm8[ns]'
508509

509510
cpdef convert_scalar(ndarray arr, object value):
511+
# we don't turn integers
512+
# into datetimes/timedeltas
513+
514+
# we don't turn bools into int/float/complex
515+
510516
if arr.descr.type_num == NPY_DATETIME:
511517
if isinstance(value, np.ndarray):
512518
pass
513-
elif isinstance(value, Timestamp):
514-
return value.value
519+
elif isinstance(value, datetime):
520+
return Timestamp(value).value
515521
elif value is None or value != value:
516522
return iNaT
517-
else:
523+
elif util.is_string_object(value):
518524
return Timestamp(value).value
525+
raise ValueError("cannot set a Timestamp with a non-timestamp")
526+
519527
elif arr.descr.type_num == NPY_TIMEDELTA:
520528
if isinstance(value, np.ndarray):
521529
pass
522-
elif isinstance(value, Timedelta):
523-
return value.value
530+
elif isinstance(value, timedelta):
531+
return Timedelta(value).value
524532
elif value is None or value != value:
525533
return iNaT
526-
else:
534+
elif util.is_string_object(value):
527535
return Timedelta(value).value
536+
raise ValueError("cannot set a Timedelta with a non-timedelta")
537+
538+
if (issubclass(arr.dtype.type, (np.integer, np.floating, np.complex)) and
539+
not issubclass(arr.dtype.type, np.bool_)):
540+
if util.is_bool_object(value):
541+
raise ValueError('Cannot assign bool to float/integer series')
528542

529543
if issubclass(arr.dtype.type, (np.integer, np.bool_)):
530544
if util.is_float_object(value) and value != value:

pandas/_libs/tslib.pyx

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ cdef bint PY3 = (sys.version_info[0] >= 3)
1414
from cpython cimport (
1515
PyTypeObject,
1616
PyFloat_Check,
17+
PyComplex_Check,
1718
PyLong_Check,
1819
PyObject_RichCompareBool,
1920
PyObject_RichCompare,
@@ -902,7 +903,7 @@ cdef inline bint _checknull_with_nat(object val):
902903
cdef inline bint _check_all_nulls(object val):
903904
""" utility to check if a value is any type of null """
904905
cdef bint res
905-
if PyFloat_Check(val):
906+
if PyFloat_Check(val) or PyComplex_Check(val):
906907
res = val != val
907908
elif val is NaT:
908909
res = 1

pandas/core/algorithms.py

+6
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,12 @@ def _reconstruct_data(values, dtype, original):
151151
pass
152152
elif is_datetime64tz_dtype(dtype) or is_period_dtype(dtype):
153153
values = Index(original)._shallow_copy(values, name=None)
154+
elif is_bool_dtype(dtype):
155+
values = values.astype(dtype)
156+
157+
# we only support object dtypes bool Index
158+
if isinstance(original, Index):
159+
values = values.astype(object)
154160
elif dtype is not None:
155161
values = values.astype(dtype)
156162

pandas/core/dtypes/cast.py

+50-8
Original file line numberDiff line numberDiff line change
@@ -273,7 +273,7 @@ def maybe_promote(dtype, fill_value=np.nan):
273273
else:
274274
if issubclass(dtype.type, np.datetime64):
275275
try:
276-
fill_value = Timestamp(fill_value).value
276+
fill_value = tslib.Timestamp(fill_value).value
277277
except:
278278
# the proper thing to do here would probably be to upcast
279279
# to object (but numpy 1.6.1 doesn't do this properly)
@@ -334,6 +334,23 @@ def maybe_promote(dtype, fill_value=np.nan):
334334
return dtype, fill_value
335335

336336

337+
def infer_dtype_from(val, pandas_dtype=False):
338+
"""
339+
interpret the dtype from a scalar or array. This is a convenience
340+
routines to infer dtype from a scalar or an array
341+
342+
Parameters
343+
----------
344+
pandas_dtype : bool, default False
345+
whether to infer dtype including pandas extension types.
346+
If False, scalar/array belongs to pandas extension types is inferred as
347+
object
348+
"""
349+
if is_scalar(val):
350+
return infer_dtype_from_scalar(val, pandas_dtype=pandas_dtype)
351+
return infer_dtype_from_array(val, pandas_dtype=pandas_dtype)
352+
353+
337354
def infer_dtype_from_scalar(val, pandas_dtype=False):
338355
"""
339356
interpret the dtype from a scalar
@@ -409,24 +426,31 @@ def infer_dtype_from_scalar(val, pandas_dtype=False):
409426
return dtype, val
410427

411428

412-
def infer_dtype_from_array(arr):
429+
def infer_dtype_from_array(arr, pandas_dtype=False):
413430
"""
414431
infer the dtype from a scalar or array
415432
416433
Parameters
417434
----------
418435
arr : scalar or array
436+
pandas_dtype : bool, default False
437+
whether to infer dtype including pandas extension types.
438+
If False, array belongs to pandas extension types
439+
is inferred as object
419440
420441
Returns
421442
-------
422-
tuple (numpy-compat dtype, array)
443+
tuple (numpy-compat/pandas-compat dtype, array)
423444
424445
Notes
425446
-----
426-
These infer to numpy dtypes exactly
427-
with the exception that mixed / object dtypes
447+
if pandas_dtype=False. these infer to numpy dtypes
448+
exactly with the exception that mixed / object dtypes
428449
are not coerced by stringifying or conversion
429450
451+
if pandas_dtype=True. datetime64tz-aware/categorical
452+
types will retain there character.
453+
430454
Examples
431455
--------
432456
>>> np.asarray([1, '1'])
@@ -443,6 +467,12 @@ def infer_dtype_from_array(arr):
443467
if not is_list_like(arr):
444468
arr = [arr]
445469

470+
if pandas_dtype and is_extension_type(arr):
471+
return arr.dtype, arr
472+
473+
elif isinstance(arr, ABCSeries):
474+
return arr.dtype, np.asarray(arr)
475+
446476
# don't force numpy coerce with nan's
447477
inferred = lib.infer_dtype(arr)
448478
if inferred in ['string', 'bytes', 'unicode',
@@ -553,7 +583,7 @@ def conv(r, dtype):
553583
if isnull(r):
554584
pass
555585
elif dtype == _NS_DTYPE:
556-
r = Timestamp(r)
586+
r = tslib.Timestamp(r)
557587
elif dtype == _TD_DTYPE:
558588
r = _coerce_scalar_to_timedelta_type(r)
559589
elif dtype == np.bool_:
@@ -1029,13 +1059,25 @@ def find_common_type(types):
10291059
return np.find_common_type(types, [])
10301060

10311061

1032-
def _cast_scalar_to_array(shape, value, dtype=None):
1062+
def cast_scalar_to_array(shape, value, dtype=None):
10331063
"""
10341064
create np.ndarray of specified shape and dtype, filled with values
1065+
1066+
Parameters
1067+
----------
1068+
shape : tuple
1069+
value : scalar value
1070+
dtype : np.dtype, optional
1071+
dtype to coerce
1072+
1073+
Returns
1074+
-------
1075+
ndarray of shape, filled with value, of specified / inferred dtype
1076+
10351077
"""
10361078

10371079
if dtype is None:
1038-
dtype, fill_value = _infer_dtype_from_scalar(value)
1080+
dtype, fill_value = infer_dtype_from_scalar(value)
10391081
else:
10401082
fill_value = value
10411083

pandas/core/dtypes/common.py

+12-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
ExtensionDtype)
1212
from .generic import (ABCCategorical, ABCPeriodIndex,
1313
ABCDatetimeIndex, ABCSeries,
14-
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex)
14+
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex,
15+
ABCIndexClass)
1516
from .inference import is_string_like
1617
from .inference import * # noqa
1718

@@ -1545,6 +1546,16 @@ def is_bool_dtype(arr_or_dtype):
15451546
except ValueError:
15461547
# this isn't even a dtype
15471548
return False
1549+
1550+
if isinstance(arr_or_dtype, ABCIndexClass):
1551+
1552+
# TODO(jreback)
1553+
# we don't have a boolean Index class
1554+
# so its object, we need to infer to
1555+
# guess this
1556+
return (arr_or_dtype.is_object and
1557+
arr_or_dtype.inferred_type == 'boolean')
1558+
15481559
return issubclass(tipo, np.bool_)
15491560

15501561

0 commit comments

Comments
 (0)