Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate conversion functions out from tslib #17875

Merged
merged 11 commits into from
Oct 31, 2017

Conversation

jbrockmendel
Copy link
Member

The functions moved are cut/paste with two exceptions: a flake8 whitespace fixup and the new helper function conversion._render_tstamp.

About half of the remaining _TSObject conversion funcs can be moved once #17805 is merged. The remainder can be isolated after #17793.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Oct 15, 2017
@pep8speaks
Copy link

pep8speaks commented Oct 15, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 29, 2017 at 22:06 Hours UTC

@codecov
Copy link

codecov bot commented Oct 17, 2017

Codecov Report

Merging #17875 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17875      +/-   ##
==========================================
- Coverage   91.23%   91.21%   -0.02%     
==========================================
  Files         163      163              
  Lines       50102    50102              
==========================================
- Hits        45712    45703       -9     
- Misses       4390     4399       +9
Flag Coverage Δ
#multiple 89.03% <ø> (ø) ⬆️
#single 40.31% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 528fdaa...dfd4979. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 17, 2017

Codecov Report

Merging #17875 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17875      +/-   ##
==========================================
- Coverage   91.26%   91.22%   -0.05%     
==========================================
  Files         163      163              
  Lines       50099    50099              
==========================================
- Hits        45724    45703      -21     
- Misses       4375     4396      +21
Flag Coverage Δ
#multiple 89.03% <ø> (-0.03%) ⬇️
#single 40.24% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...c4b84f9. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

rebase

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

if we accep this PR, what then? this just moves the conversion functions out but doesn't actually simplify things AFAICT. what is the next step?

@jbrockmendel
Copy link
Member Author

this just moves the conversion functions out but doesn't actually simplify things AFAICT

You're absolutely right. Keep in mind that to make reviewing easier I'm trying to strictly separate cut/paste PRs from actual-changes PRs.

Three goals in mind:

  • Refactoring/Cleanup for its own sake. Separating out all the TSObject logic results in a ~1k line module, which is "about right" for my tastes. (Note we can't get all the TSObject logic out just now. The remaining chunks will come after Implement npy_dtime.pyx #17805 and a soon-to-be-opened follow-up to have _NaT subclass datetime directly #17793)

  • There is at least one bug in the tz_convert/tso_localize/... code; the hope is these will be easier to track down once this is isolated.

  • Part of the over-arching MO is to avoid the run-time import of frequencies.to_offset in tslib and period. I've mostly gotten this downstream, but not completely. Separating out the conversion functions helps solidify the dependency structure as a DAG.


@cython.boundscheck(False)
@cython.wraparound(False)
cdef inline str _render_tstamp(int64_t val):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so instead of doing this, why don't you simply do

from pandas._libs.tslib import Timestamp
return str(Timestamp(val))

of course this is a local import and only occurs when you are raising anyhow.

much more obvious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and couldn't fight off the allure of self-contained-ness. Will change.

setup.py Outdated
@@ -493,6 +494,9 @@ def pxd(name):
'pxdfiles': ['_libs/src/util', '_libs/lib'],
'depends': tseries_depends,
'sources': npdt_srces},
'_libs.tslibs.conversion': {'pyxfile': '_libs/tslibs/conversion',
'depends': tseries_depends,
'sources': npdt_srces},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment from before, rename npdt_srces -> np_datetime_sources. Please don't abbreviate things, it makes it much harder to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, this change definitely made it into one or two of the ongoing PRs, hasn't caught up to all of them yet.


cdef inline bisect_right_i8(int64_t *data, int64_t val, Py_ssize_t n):
cdef Py_ssize_t pivot, left = 0, right = n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an assert n >= 1 here

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

needs a rebase

@jbrockmendel
Copy link
Member Author

Since this has already been reviewed in its current form, my preference is to close it out and then follow-up with to use e.g. dtstruct_to_dt64 and other improvements that have become available with recent merges.

@jreback jreback added this to the 0.22.0 milestone Oct 31, 2017
@jreback jreback merged commit 62dd2d3 into pandas-dev:master Oct 31, 2017
@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

thanks!

@jbrockmendel
Copy link
Member Author

Awesome!

There's some accumulated cleanup that needs to be done. Quick question since micro-optimizations matter here. Do we need to care about the difference between:

int(get_utcoffset(tzinfo, ts)) * 1000000000
int(get_utcoffset(tzinfo, ts) * 1e9)

?

@jbrockmendel jbrockmendel deleted the tslibs-conversion5 branch October 31, 2017 00:59
jreback pushed a commit that referenced this pull request Oct 31, 2017
peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
GuessWhoSamFoo pushed a commit to GuessWhoSamFoo/pandas that referenced this pull request Nov 1, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants