Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option #22644

Merged
merged 64 commits into from
Oct 25, 2018

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Sep 9, 2018

Currently, users do not have any control over nonexistent datetime handling when tz_localizeing like they do ambiguous times. This adds a new keyword nonexistent to tz_localize so that users now can:

'raise': Raise an error (default)
'NaT': Replace nonexistent times with 'NaT'
'shift': Shift nonexistent times forward to the closest existing time

@pep8speaks
Copy link

Hello @mroeschke! Thanks for submitting the PR.

@mroeschke mroeschke added Enhancement Timezones Timezone data dtype labels Sep 9, 2018
@codecov
Copy link

codecov bot commented Sep 9, 2018

Codecov Report

Merging #22644 into master will increase coverage by <.01%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22644      +/-   ##
==========================================
+ Coverage   92.22%   92.22%   +<.01%     
==========================================
  Files         169      169              
  Lines       50911    50922      +11     
==========================================
+ Hits        46954    46965      +11     
  Misses       3957     3957
Flag Coverage Δ
#multiple 90.65% <94.11%> (ø) ⬆️
#single 42.28% <23.52%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 97.45% <100%> (+0.05%) ⬆️
pandas/core/generic.py 96.79% <83.33%> (-0.05%) ⬇️
pandas/util/testing.py 86.73% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c26375...8cf16e2. Read the comment docs.

@mroeschke mroeschke added this to the 0.24.0 milestone Sep 11, 2018
pandas/_libs/tslibs/nattype.pyx Show resolved Hide resolved
pandas/_libs/tslibs/timestamps.pyx Show resolved Hide resolved
@mroeschke
Copy link
Member Author

mroeschke commented Sep 19, 2018

@jreback re: overlap between errors and nonexistent

  1. The original issue mentions Timestamp.tz_localize() NonExistentTimeError handling #8917 (comment) having the ability to control over ambiguous times vs nonexistent times independently.
    1a) nonexistent and ambiguous can handle their own errors in this PR independently
  2. This PR has the ability to shift the nonexistent time to a real time (like how ambiguous can take True/False

So I would propose that eventually we can depreciate errors and keep both ambiguous and nonexistent

@jreback
Copy link
Contributor

jreback commented Oct 14, 2018

lgtm @jorisvandenbossche any more comments?

@@ -565,6 +565,8 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift' will shift the nonexistent time forward to the closest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, rst formatting nitpick: there needs to be a blank line between the first sentences, and the start of this list ... (getting rst right can be annoying ..)

@mroeschke
Copy link
Member Author

Thanks @jorisvandenbossche. Added those blank lines for rendering.

@pytest.mark.parametrize('tz', ['Europe/Warsaw', 'dateutil/Europe/Warsaw'])
@pytest.mark.parametrize('method, exp', [
['shift', '2015-03-29 03:00:00'],
['NaT', pd.NaT],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have tests that exericse the assertion when you pass a nonexistent keyword that is invalid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a test for an invalid nonexistent keyword.

@@ -978,14 +979,26 @@ class Timestamp(_Timestamp):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone

@@ -639,15 +639,27 @@ def tz_localize(self, tz, ambiguous='raise', errors='raise'):
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times

errors : {'raise', 'coerce'}, default 'raise'
nonexistent : 'shift', 'NaT' default 'raise'
A nonexistent time doesn't not exist in a particular timezone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone

@@ -8659,6 +8659,17 @@ def tz_localize(self, tz, axis=0, level=None, copy=True,
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone

@mroeschke
Copy link
Member Author

Thanks for catching that typo @jorisvandenbossche

@jreback
Copy link
Contributor

jreback commented Oct 24, 2018

one more rebase and I think ok to go

@jreback
Copy link
Contributor

jreback commented Oct 25, 2018

thanks @mroeschke

@mroeschke mroeschke deleted the normalize_tz branch October 25, 2018 15:48
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…ndas

* repo_org/master: (23 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…xamples

* repo_org/master: (83 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp.tz_localize() NonExistentTimeError handling
4 participants