Skip to content

BUG: get_loc / get_indexer with NaT and tz-aware DatetimeIndex #32572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kernc opened this issue Mar 10, 2020 · 5 comments
Open

BUG: get_loc / get_indexer with NaT and tz-aware DatetimeIndex #32572

kernc opened this issue Mar 10, 2020 · 5 comments
Assignees
Labels
Bug Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype

Comments

@kernc
Copy link
Contributor

kernc commented Mar 10, 2020

Code Sample, a copy-pastable example if possible

>>> pd.date_range('2020', 'now').get_loc(pd.NaT, method='nearest')
0
    # Ok? NaT would be better to propagate.

>>> pd.date_range('2020', 'now', tz='US/Central').get_loc(pd.NaT, method='nearest')
-----------------------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pandas/core/indexes/datetimes.py", line 582, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "pandas/core/indexes/base.py", line 2869, in get_loc
    indexer = self.get_indexer([key], method=method, tolerance=tolerance)
  File "pandas/core/indexes/base.py", line 2951, in get_indexer
    target, method=method, limit=limit, tolerance=tolerance
  File "pandas/core/indexes/base.py", line 2962, in get_indexer
    indexer = self._get_nearest_indexer(target, limit, tolerance)
  File "pandas/core/indexes/base.py", line 3046, in _get_nearest_indexer
    left_distances = np.abs(self[left_indexer] - target)
  File "pandas/core/indexes/base.py", line 2361, in __sub__
    return Index(np.array(self) - other)
  File "pandas/core/indexes/base.py", line 2367, in __rsub__
    return Index(other - Series(self))
  File "pandas/core/series.py", line 646, in __array_ufunc__
    self, ufunc, method, *inputs, **kwargs
  File "pandas/_libs/ops_dispatch.pyx", line 91, in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op
  File "pandas/core/ops/common.py", line 63, in new_method
    return method(self, other)
  File "pandas/core/ops/__init__.py", line 500, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  File "pandas/core/ops/array_ops.py", line 218, in arithmetic_op
    res_values = dispatch_to_extension_op(op, lvalues, rvalues)
  File "pandas/core/ops/dispatch.py", line 125, in dispatch_to_extension_op
    res_values = op(left, right)
  File "pandas/core/ops/roperator.py", line 13, in rsub
    return right - left
  File "pandas/core/arrays/datetimelike.py", line 1428, in __rsub__
    f"cannot subtract {type(self).__name__} from {type(other).__name__}"
TypeError: cannot subtract DatetimeArray from ndarray

Problem description

pd.NaT is NaT regardless of timezone.

Expected Output

>>> pd.date_range('2020', 'now').get_loc(pd.NaT, method='nearest')
NaT

>>> pd.date_range('2020', 'now', tz='US/Central').get_loc(pd.NaT, method='nearest')
NaT

Output of pd.show_versions()

pandas 1.1.0.dev0+725.gae79bb23c

@jorisvandenbossche
Copy link
Member

I suppose the title is wrong?

@jorisvandenbossche
Copy link
Member

Ah, sorry, I see that it is the message in the error (but still, that's not the actual issue I think). Previously in 0.25.0, there was a different (but also not good) error: "TypeError: bad operand type for abs(): 'NaTType'"

@jorisvandenbossche jorisvandenbossche added Bug Datetime Datetime data dtype labels Mar 10, 2020
@jorisvandenbossche
Copy link
Member

It seems that somewhere in the code, the datetime index is converted to object dtype, which leads to having an object dtype array with timestamps (and this gives the error about not being able to subtract a ndarray).

This happens here:

if not is_dtype_equal(self.dtype, target.dtype):
this = self.astype(object)
target = target.astype(object)
return this.get_indexer(
target, method=method, limit=limit, tolerance=tolerance
)

and we end up there, because the dtype of the index is not equal to the index of the target (dattime64[ns, tz] vs datetime64[ns]).

@jorisvandenbossche jorisvandenbossche changed the title cannot subtract DatetimeArray from ndarray BUG: get_loc / get_indexer with NaT and tz-aware DatetimeIndex Mar 10, 2020
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Mar 10, 2020
@letapxad
Copy link

take

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Mar 10, 2020
hoangbn added a commit to CSCD01-team14/pandas that referenced this issue Mar 11, 2020
letapxad added a commit to CSCD01-team14/pandas that referenced this issue Mar 11, 2020
letapxad added a commit to CSCD01-team14/pandas that referenced this issue Mar 11, 2020
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype labels Jul 30, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

NaT is not a sensible return type for get_loc/get_indexer. These methods return integers, masks, or slices that can be usd in positional indexing.

dti = pd.date_range('2020', 'now', tz='US/Central')
target = pd.DatetimeIndex([pd.NaT], dtype=dti.dtype)

>>> dti.get_indexer(target)
array([-1])

>>> dti.get_indexer(target, method="nearest")
array([1301])

The 1301 seems weird to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

5 participants