Skip to content

ENH: Added stride/offset aliases in to_datetime #26631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

timcera
Copy link
Contributor

@timcera timcera commented Jun 3, 2019

Added support for stride and offset aliases to the 'unit' keyword in
'to_datetime'. Documented support for additional 'unit' intervals that
were available.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Added support for stride and offset aliases to the 'unit' keyword in
'to_datetime'.  Documented support for additional 'unit' intervals that
were available.
@codecov
Copy link

codecov bot commented Jun 3, 2019

Codecov Report

Merging #26631 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26631      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50692              
==========================================
- Hits        46576    46572       -4     
- Misses       4116     4120       +4
Flag Coverage Δ
#multiple 90.41% <ø> (ø) ⬆️
#single 41.8% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 85.05% <ø> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d124ea...9c67f70. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 3, 2019

Codecov Report

Merging #26631 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26631      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50692              
==========================================
- Hits        46576    46572       -4     
- Misses       4116     4120       +4
Flag Coverage Δ
#multiple 90.41% <ø> (ø) ⬆️
#single 41.8% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 85.05% <ø> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d124ea...14e987b. Read the comment docs.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this close an existing issue? Also tests are critical to make sure this actually works

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yearly and monthly offsets were deprecated recently. #16344.

Week offsets are on the chopping block too so it's best not to advertise them as well. #14024 (comment)

@timcera
Copy link
Contributor Author

timcera commented Jun 4, 2019

Does this close an existing issue? Also tests are critical to make sure this actually works

I couldn't find an existing issue. I had an issue that I didn't report, however. I was trying to use to_datetime to make an hourly series and blindly used to_datetime([1, 2], unit='H') didn't work, which was puzzling. So looked at the code finding that to_datetime([1, 2], unit='h') did work, but 'h' is not an offset alias. Other strange things where, 'm' meant minutely, and 'M' meant monthly.

After working with the code found it easy to add stride.

Passed current tests, so I didn't break old behaviors.

Where should I put new tests?

@timcera
Copy link
Contributor Author

timcera commented Jun 4, 2019

Yearly and monthly offsets were deprecated recently. #16344.

Week offsets are on the chopping block too so it's best not to advertise them as well. #14024 (comment)

There isn't a deprecation warning in to_datetime(). Will add deprecation warning for 'Y', 'A', 'M', and 'W'.

@gfyoung gfyoung added Enhancement Datetime Datetime data dtype labels Jun 5, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a bunch of tests for this already in pandas/tests/indexes/timedeltas/test_tools.py

so I suppose some of these were not caught if you actually need to do this?

also would need to do an asv run on appropriate benchmarks for this change

unit = 'L'
elif unit == 'us':
unit = 'U'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this an else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. I actually want "unit" to be unchanged if not one of the tested values so what would an "else" do? Wouldn't it force all "unit" values to be "U"? I have in the past used a dictionary instead to do this same thing if that is a better and more readable approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
    unit = unit.upper()

unit : string, default is 'N'
The unit of the arg. Uses a subset of the pandas offset aliases.

- 'Y', 'A' for yearly (long term average of 365.2425 days)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove Y, A, M here I think we deprecated these anyhow

Copy link
Contributor Author

@timcera timcera Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already my working version is a deprecation warning for "Y", "A", "M" and "W". The "W" deprecation is discussed earlier in this conversation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so don't mention the deprecated ones

@timcera
Copy link
Contributor Author

timcera commented Jun 10, 2019

we have a bunch of tests for this already in pandas/tests/indexes/timedeltas/test_tools.py

No test for the added stride, also almost all of the tests are only for unit='D' and unit='s'.

so I suppose some of these were not caught if you actually need to do this?

I think reasonable to add tests for stride and for other accepted units. Working on it.

also would need to do an asv run on appropriate benchmarks for this change

I have not done this before. Will look around on how to make that happen.

unit = 'L'
elif unit == 'us':
unit = 'U'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
    unit = unit.upper()

unit : string, default is 'N'
The unit of the arg. Uses a subset of the pandas offset aliases.

- 'Y', 'A' for yearly (long term average of 365.2425 days)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so don't mention the deprecated ones


unit, stride = _base_and_stride(unit)

# Normalize old or lowercase codes to standard offset aliases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure we are actually testing all of these, can you see where we are and validate

@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

can you merge master and respond to comments

@timcera
Copy link
Contributor Author

timcera commented Jul 9, 2019

Almost done.

Ran into this numpy/numpy#8325, which affects several of the tests and for some time thought it was my code failing instead. Tried to resolve/fix but just spun wheels instead.

Ran the benchmarks multiple times and got different answers each time. For example, see the attached files of the results summaries from two benchmark runs. Both done overnight while my Chromebook was not used for anything else.
summary_results_2.txt
summary_results_1.txt
Don't know what to do about that. Copying pandas working directory to a beefier machine and see what happens.

Should I squash all commits into one?

@jreback
Copy link
Contributor

jreback commented Jul 9, 2019

Almost done.

Ran into this numpy/numpy#8325, which affects several of the tests and for some time thought it was my code failing instead. Tried to resolve/fix but just spun wheels instead.

Ran the benchmarks multiple times and got different answers each time. For example, see the attached files of the results summaries from two benchmark runs. Both done overnight while my Chromebook was not used for anything else.
summary_results_2.txt
summary_results_1.txt
Don't know what to do about that. Copying pandas working directory to a beefier machine and see what happens.

Should I squash all commits into one?

we don't use numpy directly for (very little) nan operations, and not for timedeltas so not sure where you are hitting this.

@timcera
Copy link
Contributor Author

timcera commented Jul 10, 2019

we don't use numpy directly for (very little) nan operations, and not for timedeltas so not sure where you are hitting this.

Initial bug report is #26964 that shows the problem in pandas. Closed it when I found that numpy had the same bug and thought, maybe incorrectly, that pandas was using numpy and therefore getting the same behavior.

timcera added 3 commits July 28, 2019 14:16
Added support for stride and offset aliases to the 'unit' keyword in
'to_datetime'.  Documented support for additional 'unit' intervals that
were available.
The 'origin' in to_datetime can be any resolution.
New tests for origin resolution and strided unit codes.
Deprecated warning for to_datetime units: 'A', 'Y', 'M', 'W'.
Edited docstring to reflect strided unit codes and origin resolution.
Replaced re test with simple lstrip.  Moved pattern to only other file where it is used.
Added 'd' to represent daily, but default should be 'D'. Changed 'd' to 'D' in tests.
@TomAugspurger
Copy link
Contributor

Looks like a git merge issue. Let us know if you need help fixing.

timcera added 3 commits July 30, 2019 07:14
Added support for stride and offset aliases to the 'unit' keyword in
'to_datetime'.  Documented support for additional 'unit' intervals that
were available.
The 'origin' in to_datetime can be any resolution.
New tests for origin resolution and strided unit codes.
Deprecated warning for to_datetime units: 'A', 'Y', 'M', 'W'.
Edited docstring to reflect strided unit codes and origin resolution.
Replaced re test with simple lstrip.  Moved pattern to only other file where it is used.
Added 'd' to represent daily, but default should be 'D'. Changed 'd' to 'D' in tests.
@timcera
Copy link
Contributor Author

timcera commented Jul 30, 2019

Yep. Need help with git. I though everything looked good here, but obviously it is a mess.

I could send the log of what I had done if that is helpful.

@TomAugspurger
Copy link
Contributor

In theory, a

git fetch upstream
git merge upstream/master

should get you straightened out.

@WillAyd
Copy link
Member

WillAyd commented Aug 28, 2019

@timcera can you merge master again? Looks like old builds are no longer available

@TomAugspurger
Copy link
Contributor

numpydev is failing. Does this PR introduce new warnings from Numpy? Running with pytest -wError may help.

@jreback
Copy link
Contributor

jreback commented Oct 6, 2019

closing as stale; if you want to continue working, pls merge upstream/master and ping to re-open.

@jreback jreback closed this Oct 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants