Skip to content

Commit

Permalink
API/ENH: tz_localize handling of nonexistent times: rename keyword + …
Browse files Browse the repository at this point in the history
…add shift option (#22644)
  • Loading branch information
mroeschke authored and jreback committed Oct 25, 2018
1 parent 6b8e5e8 commit 0a2d501
Show file tree
Hide file tree
Showing 10 changed files with 330 additions and 65 deletions.
32 changes: 32 additions & 0 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2357,6 +2357,38 @@ constructor as well as ``tz_localize``.
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
didx.tz_convert('UCT').tz_localize(None)
.. _timeseries.timezone_nonexistent:

Nonexistent Times when Localizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A DST transition may also shift the local time ahead by 1 hour creating nonexistent
local times. The behavior of localizing a timeseries with nonexistent times
can be controlled by the ``nonexistent`` argument. The following options are available:

* ``raise``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
* ``NaT``: Replaces nonexistent times with ``NaT``
* ``shift``: Shifts nonexistent times forward to the closest real time

.. ipython:: python
dti = date_range(start='2015-03-29 01:30:00', periods=3, freq='H')
# 2:30 is a nonexistent time
Localization of nonexistent times will raise an error by default.

.. code-block:: ipython
In [2]: dti.tz_localize('Europe/Warsaw')
NonExistentTimeError: 2015-03-29 02:30:00
Transform nonexistent times to ``NaT`` or the closest real time forward in time.

.. ipython:: python
dti
dti.tz_localize('Europe/Warsaw', nonexistent='shift')
dti.tz_localize('Europe/Warsaw', nonexistent='NaT')
.. _timeseries.timezone_series:

TZ Aware Dtypes
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ Other Enhancements
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`)

.. _whatsnew_0240.api_breaking:

Expand Down Expand Up @@ -912,6 +913,7 @@ Deprecations
- :meth:`FrozenNDArray.searchsorted` has deprecated the ``v`` parameter in favor of ``value`` (:issue:`14645`)
- :func:`DatetimeIndex.shift` and :func:`PeriodIndex.shift` now accept ``periods`` argument instead of ``n`` for consistency with :func:`Index.shift` and :func:`Series.shift`. Using ``n`` throws a deprecation warning (:issue:`22458`, :issue:`22912`)
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)

.. _whatsnew_0240.prior_deprecations:

Expand Down
82 changes: 48 additions & 34 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# -*- coding: utf-8 -*-

import cython
from cython import Py_ssize_t

Expand Down Expand Up @@ -44,6 +43,7 @@ from nattype cimport NPY_NAT, checknull_with_nat
# Constants

cdef int64_t DAY_NS = 86400000000000LL
cdef int64_t HOURS_NS = 3600000000000
NS_DTYPE = np.dtype('M8[ns]')
TD_DTYPE = np.dtype('m8[ns]')

Expand Down Expand Up @@ -458,8 +458,7 @@ cdef _TSObject convert_str_to_tsobject(object ts, object tz, object unit,
if tz is not None:
# shift for localize_tso
ts = tz_localize_to_utc(np.array([ts], dtype='i8'), tz,
ambiguous='raise',
errors='raise')[0]
ambiguous='raise')[0]

except OutOfBoundsDatetime:
# GH#19382 for just-barely-OutOfBounds falling back to dateutil
Expand Down Expand Up @@ -826,7 +825,7 @@ def tz_convert(int64_t[:] vals, object tz1, object tz2):
@cython.boundscheck(False)
@cython.wraparound(False)
def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
object errors='raise'):
object nonexistent=None):
"""
Localize tzinfo-naive i8 to given time zone (using pytz). If
there are ambiguities in the values, raise AmbiguousTimeError.
Expand All @@ -837,7 +836,10 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
tz : tzinfo or None
ambiguous : str, bool, or arraylike
If arraylike, must have the same length as vals
errors : {"raise", "coerce"}, default "raise"
nonexistent : str
If arraylike, must have the same length as vals
.. versionadded:: 0.24.0
Returns
-------
Expand All @@ -849,16 +851,13 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
ndarray ambiguous_array
Py_ssize_t i, idx, pos, ntrans, n = len(vals)
int64_t *tdata
int64_t v, left, right
int64_t v, left, right, val, v_left, v_right
ndarray[int64_t] result, result_a, result_b, dst_hours
npy_datetimestruct dts
bint infer_dst = False, is_dst = False, fill = False
bint is_coerce = errors == 'coerce', is_raise = errors == 'raise'
bint shift = False, fill_nonexist = False

# Vectorized version of DstTzInfo.localize

assert is_coerce or is_raise

if tz == UTC or tz is None:
return vals

Expand Down Expand Up @@ -888,39 +887,45 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
"the same size as vals")
ambiguous_array = np.asarray(ambiguous)

if nonexistent == 'NaT':
fill_nonexist = True
elif nonexistent == 'shift':
shift = True
else:
assert nonexistent in ('raise', None), ("nonexistent must be one of"
" {'NaT', 'raise', 'shift'}")

trans, deltas, typ = get_dst_info(tz)

tdata = <int64_t*> cnp.PyArray_DATA(trans)
ntrans = len(trans)

# Determine whether each date lies left of the DST transition (store in
# result_a) or right of the DST transition (store in result_b)
result_a = np.empty(n, dtype=np.int64)
result_b = np.empty(n, dtype=np.int64)
result_a.fill(NPY_NAT)
result_b.fill(NPY_NAT)

# left side
idx_shifted = (np.maximum(0, trans.searchsorted(
idx_shifted_left = (np.maximum(0, trans.searchsorted(
vals - DAY_NS, side='right') - 1)).astype(np.int64)

for i in range(n):
v = vals[i] - deltas[idx_shifted[i]]
pos = bisect_right_i8(tdata, v, ntrans) - 1

# timestamp falls to the left side of the DST transition
if v + deltas[pos] == vals[i]:
result_a[i] = v

# right side
idx_shifted = (np.maximum(0, trans.searchsorted(
idx_shifted_right = (np.maximum(0, trans.searchsorted(
vals + DAY_NS, side='right') - 1)).astype(np.int64)

for i in range(n):
v = vals[i] - deltas[idx_shifted[i]]
pos = bisect_right_i8(tdata, v, ntrans) - 1
val = vals[i]
v_left = val - deltas[idx_shifted_left[i]]
pos_left = bisect_right_i8(tdata, v_left, ntrans) - 1
# timestamp falls to the left side of the DST transition
if v_left + deltas[pos_left] == val:
result_a[i] = v_left

v_right = val - deltas[idx_shifted_right[i]]
pos_right = bisect_right_i8(tdata, v_right, ntrans) - 1
# timestamp falls to the right side of the DST transition
if v + deltas[pos] == vals[i]:
result_b[i] = v
if v_right + deltas[pos_right] == val:
result_b[i] = v_right

if infer_dst:
dst_hours = np.empty(n, dtype=np.int64)
Expand All @@ -935,7 +940,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
stamp = _render_tstamp(vals[trans_idx])
raise pytz.AmbiguousTimeError(
"Cannot infer dst time from %s as there "
"are no repeated times" % stamp)
"are no repeated times".format(stamp))
# Split the array into contiguous chunks (where the difference between
# indices is 1). These are effectively dst transitions in different
# years which is useful for checking that there is not an ambiguous
Expand All @@ -960,18 +965,19 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
if switch_idx.size > 1:
raise pytz.AmbiguousTimeError(
"There are %i dst switches when "
"there should only be 1." % switch_idx.size)
"there should only be 1.".format(switch_idx.size))
switch_idx = switch_idx[0] + 1
# Pull the only index and adjust
a_idx = grp[:switch_idx]
b_idx = grp[switch_idx:]
dst_hours[grp] = np.hstack((result_a[a_idx], result_b[b_idx]))

for i in range(n):
val = vals[i]
left = result_a[i]
right = result_b[i]
if vals[i] == NPY_NAT:
result[i] = vals[i]
if val == NPY_NAT:
result[i] = val
elif left != NPY_NAT and right != NPY_NAT:
if left == right:
result[i] = left
Expand All @@ -986,19 +992,27 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
elif fill:
result[i] = NPY_NAT
else:
stamp = _render_tstamp(vals[i])
stamp = _render_tstamp(val)
raise pytz.AmbiguousTimeError(
"Cannot infer dst time from %r, try using the "
"'ambiguous' argument" % stamp)
"'ambiguous' argument".format(stamp))
elif left != NPY_NAT:
result[i] = left
elif right != NPY_NAT:
result[i] = right
else:
if is_coerce:
# Handle nonexistent times
if shift:
# Shift the nonexistent time forward to the closest existing
# time
remaining_minutes = val % HOURS_NS
new_local = val + (HOURS_NS - remaining_minutes)
delta_idx = trans.searchsorted(new_local, side='right') - 1
result[i] = new_local - deltas[delta_idx]
elif fill_nonexist:
result[i] = NPY_NAT
else:
stamp = _render_tstamp(vals[i])
stamp = _render_tstamp(val)
raise pytz.NonExistentTimeError(stamp)

return result
Expand Down
20 changes: 16 additions & 4 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -564,14 +564,26 @@ class NaTType(_NaT):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time
errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times
.. versionadded:: 0.24.0
errors : 'raise', 'coerce', default None
- 'raise' will raise a NonExistentTimeError if a timestamp is not
valid in the specified timezone (e.g. due to a transition from
or to DST time)
or to DST time). Use ``nonexistent='raise'`` instead.
- 'coerce' will return NaT if the timestamp can not be converted
into the specified timezone
into the specified timezone. Use ``nonexistent='NaT'`` instead.
.. versionadded:: 0.19.0
.. deprecated:: 0.24.0
Returns
-------
Expand Down
43 changes: 37 additions & 6 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -961,7 +961,8 @@ class Timestamp(_Timestamp):
def is_leap_year(self):
return bool(ccalendar.is_leapyear(self.year))

def tz_localize(self, tz, ambiguous='raise', errors='raise'):
def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',
errors=None):
"""
Convert naive Timestamp to local time zone, or remove
timezone from tz-aware Timestamp.
Expand All @@ -978,14 +979,26 @@ class Timestamp(_Timestamp):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time
errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times
.. versionadded:: 0.24.0
errors : 'raise', 'coerce', default None
- 'raise' will raise a NonExistentTimeError if a timestamp is not
valid in the specified timezone (e.g. due to a transition from
or to DST time)
or to DST time). Use ``nonexistent='raise'`` instead.
- 'coerce' will return NaT if the timestamp can not be converted
into the specified timezone
into the specified timezone. Use ``nonexistent='NaT'`` instead.
.. versionadded:: 0.19.0
.. deprecated:: 0.24.0
Returns
-------
Expand All @@ -999,13 +1012,31 @@ class Timestamp(_Timestamp):
if ambiguous == 'infer':
raise ValueError('Cannot infer offset with only one time.')

if errors is not None:
warnings.warn("The errors argument is deprecated and will be "
"removed in a future release. Use "
"nonexistent='NaT' or nonexistent='raise' "
"instead.", FutureWarning)
if errors == 'coerce':
nonexistent = 'NaT'
elif errors == 'raise':
nonexistent = 'raise'
else:
raise ValueError("The errors argument must be either 'coerce' "
"or 'raise'.")

if nonexistent not in ('raise', 'NaT', 'shift'):
raise ValueError("The nonexistent argument must be one of 'raise',"
" 'NaT' or 'shift'")

if self.tzinfo is None:
# tz naive, localize
tz = maybe_get_tz(tz)
if not is_string_object(ambiguous):
ambiguous = [ambiguous]
value = tz_localize_to_utc(np.array([self.value], dtype='i8'), tz,
ambiguous=ambiguous, errors=errors)[0]
ambiguous=ambiguous,
nonexistent=nonexistent)[0]
return Timestamp(value, tz=tz)
else:
if tz is None:
Expand Down
Loading

0 comments on commit 0a2d501

Please sign in to comment.