Skip to content

move implementation of Timedelta to tslibs.timedeltas #18085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 8, 2017

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

# resolution in ns
Timedelta.min = Timedelta(np.iinfo(np.int64).min +1)
Timedelta.max = Timedelta(np.iinfo(np.int64).max)

cdef PyTypeObject* td_type = <PyTypeObject*> Timedelta


cdef inline bint is_timedelta(object o):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used, could probably be removed (along with td_type above)

# ----------------------------------------------------------------------

cpdef int64_t _delta_to_nanoseconds(delta) except? -1:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_delta_to_nanoseconds is cut/paste

delta.microseconds) * 1000


cpdef convert_to_timedelta64(object ts, object unit):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert_to_timedelta64 is cut/paste

return ts.astype('timedelta64[ns]')


cpdef array_to_timedelta64(ndarray[object] values, unit='ns', errors='raise'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_to_timedelta64 is cut/paste

# define a binary operation that only works if the other argument is
# timedelta like or an array of timedeltalike
def f(self, other):
if hasattr(other, 'delta') and not PyDelta_Check(other):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(other, Timedelta) --> PyDelta_Check(other) effectively equivalent, should be more performant.

# We are implicitly requiring the canonical behavior to be
# defined by Timestamp methods.

elif PyDateTime_CheckExact(other):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(other, datetime) and not isinstance(other, Timestamp) --> PyDateTime_CheckExact(other). Effectively equivalent, should be more performant, and doesn't require Timestamp in the namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 2 separate branches here that are the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historical accident, will fix.

cdef _to_py_int_float(v):
# Note: This used to be defined inside _timedelta_value_kwargs
# (and Timedelta.__new__ before that), but cython
# will not allow dynamically-defined functions nested that way.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be re-worded to clarify cython will not allow cdef functions to be defined dynamically.

Copy link
Member

@gfyoung gfyoung Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So...go and reword it? 😄

@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

Merging #18085 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18085      +/-   ##
==========================================
- Coverage   91.27%   91.26%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45749    45740       -9     
- Misses       4371     4380       +9
Flag Coverage Δ
#multiple 89.07% <ø> (ø) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4375bd...60c9a9d. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

Merging #18085 into master will decrease coverage by 0.01%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18085      +/-   ##
==========================================
- Coverage    91.4%   91.39%   -0.02%     
==========================================
  Files         163      163              
  Lines       50073    50134      +61     
==========================================
+ Hits        45769    45819      +50     
- Misses       4304     4315      +11
Flag Coverage Δ
#multiple 89.2% <86.66%> (ø) ⬆️
#single 40.33% <40%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.15% <100%> (+0.04%) ⬆️
pandas/core/indexes/datetimelike.py 97.11% <100%> (ø) ⬆️
pandas/core/resample.py 96.16% <100%> (ø) ⬆️
pandas/core/indexes/period.py 92.89% <66.66%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.41% <0%> (-0.1%) ⬇️
pandas/core/nanops.py 96.67% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93c755e...f5e8fe0. Read the comment docs.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Nov 3, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 3, 2017

@jbrockmendel : Minor comments. Also, let's asv to make sure as usual.

from tslibs.timedeltas cimport parse_timedelta_string, cast_from_unit
from tslibs.timedeltas cimport cast_from_unit, _delta_to_nanoseconds
from tslibs.timedeltas import (Timedelta, convert_to_timedelta64,
_delta_to_nanoseconds, array_to_timedelta64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double import

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you not cimporting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only cimporting things that are used in tslib. Other imports are pass-through for other modules to find in tslib. Outside imports will be updated in a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_delta_to_nanoseconds cimport version is used in tslib, and python version is pass-through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean passthru? if its not used then remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only cimporting things that are used in tslib. Other imports are pass-through for other modules to find in tslib. Outside imports will be updated in a follow-up.

no let's fix this here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll de-privatize _delta_to_nanoseconds while I'm at it.


# Exposed for tslib, not intended for outside use.
cdef parse_timedelta_string(object ts)
cpdef int64_t cast_from_unit(object ts, object unit) except? -1
cpdef int64_t _delta_to_nanoseconds(delta) except? -1
cpdef convert_to_timedelta64(object ts, object unit)
cpdef array_to_timedelta64(ndarray[object] values, unit=*, errors=*)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so remove it

# We are implicitly requiring the canonical behavior to be
# defined by Timestamp methods.

elif PyDateTime_CheckExact(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 2 separate branches here that are the same?

@jbrockmendel
Copy link
Member Author

Several asv runs:

taskset 5 asv continuous -E virtualenv -f 1.1 master HEAD -b timeseries
[...]
       before           after         ratio
     [b4375bde]       [98e339a5]
+     19.7±0.04μs       22.0±0.1μs     1.12  timeseries.Offsets.time_timeseries_day_incr
-     16.5±0.06ms      14.9±0.08ms     0.90  timeseries.Iteration.time_iter_periodindex_preexit
-     7.62±0.01μs      6.81±0.01μs     0.89  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons

       before           after         ratio
     [b4375bde]       [98e339a5]
+     2.53±0.01ms      2.87±0.03ms     1.13  timeseries.AsOf.time_asof_nan_single
+      19.9±0.1μs       22.1±0.1μs     1.11  timeseries.Offsets.time_timeseries_day_incr
-      32.9±0.1μs       28.0±0.1μs     0.85  timeseries.Offsets.time_custom_bday_cal_decr
-        137±10ms         35.8±3ms     0.26  timeseries.DatetimeIndex.time_dti_tz_factorize
-        174±30ms         36.8±3ms     0.21  timeseries.DatetimeIndex.time_dti_factorize

       before           after         ratio
     [b4375bde]       [98e339a5]
+         429±6ms            2.56s     5.96  timeseries.AsOfDataFrame.time_asof_nan
+        408±10ms            2.30s     5.64  timeseries.AsOfDataFrame.time_asof
-     56.3±0.05μs      50.6±0.08μs     0.90  timeseries.SemiMonthOffset.time_begin_decr
-     7.26±0.08μs      6.51±0.02μs     0.90  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-     21.9±0.09μs       18.7±0.1μs     0.86  timeseries.Offsets.time_custom_bday_apply_dt64
-        27.0±2ms       6.64±0.2ms     0.25  timeseries.DatetimeIndex.time_dti_tz_factorize
-        30.1±4ms       7.17±0.1ms     0.24  timeseries.DatetimeIndex.time_dti_factorize

       before           after         ratio
     [b4375bde]       [98e339a5]
+      17.1±0.2μs      20.1±0.08μs     1.18  timeseries.Offsets.time_custom_bday_apply_dt64
+      27.5±0.1μs       32.1±0.2μs     1.17  timeseries.Offsets.time_custom_bday_decr
+      18.2±0.2μs      20.1±0.06μs     1.11  timeseries.Offsets.time_timeseries_day_apply
-      20.6±0.1μs      18.7±0.06μs     0.91  timeseries.Offsets.time_custom_bday_incr
-     7.39±0.02μs      6.71±0.02μs     0.91  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-     31.1±0.07μs       27.4±0.1μs     0.88  timeseries.Offsets.time_custom_bday_cal_incr_neg_n
-        2.77±0ms         1.68±0ms     0.61  timeseries.DatetimeIndex.time_add_timedelta
-         220±6ms       27.5±0.7ms     0.13  timeseries.AsOfDataFrame.time_asof_nan
-         205±3ms      24.2±0.05ms     0.12  timeseries.AsOfDataFrame.time_asof

      before           after         ratio
     [b4375bde]       [98e339a5]
+     2.72±0.01ms      3.31±0.09ms     1.22  timeseries.ToDatetime.time_iso8601_nosep
+     22.3±0.07μs       24.7±0.1μs     1.11  timeseries.AsOf.time_asof_single
+     2.50±0.03ms      2.75±0.02ms     1.10  timeseries.AsOf.time_asof_nan_single
-         443±3ms          397±2ms     0.90  timeseries.SemiMonthOffset.time_begin_incr_rng
-      8.65±0.1μs      7.72±0.04μs     0.89  timeseries.Offsets.time_timeseries_year_apply
-      7.80±0.3μs      6.95±0.03μs     0.89  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

you have some wild variability. pls run for timedeltas too.

@jbrockmendel
Copy link
Member Author

you have some wild variability

This is a truthfact.

pls run for timedeltas too

asv continuous -f 1.1 -E virtualenv master HEAD -b timedelta
[...]
BENCHMARKS NOT SIGNIFICANTLY CHANGED

asv continuous -f 1.1 -E virtualenv master HEAD -b timedelta
[...]
       before           after         ratio
     [b4375bde]       [98e339a5]
+        59.2±7ms          120±6ms     2.03  inference.DtypeInfer.time_timedelta64_2

asv continuous -f 1.1 -E virtualenv master HEAD -b timedelta
[...]
BENCHMARKS NOT SIGNIFICANTLY CHANGED.

asv continuous -f 1.1 -E virtualenv master HEAD -b timedelta
[...]
       before           after         ratio
     [b4375bde]       [98e339a5]
-         106±2ms         84.4±2ms     0.80  inference.DtypeInfer.time_timedelta64_1

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

ok so it doesn't change perf, which is expected, good.

@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

I think you need to add timedeltas.pxd to period in setup.py; I don't think you can remove anything from tslib def (but check). rebase.


from nattype import nat_strings
from nattype import nat_strings, NaT
from nattype cimport _checknull_with_nat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timedeltas should depend on nattype.pxd

# ----------------------------------------------------------------------
# Timedelta Construction

cdef _to_py_int_float(v):
# Note: This used to be defined inside _timedelta_value_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should prob move to util.pyx (I know it doesn't exist, but should). though maybe not useful outside of this module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

util.pyx (I know it doesn't exist, but should).

Separate issue: the vast majority of usage of util is for is_foo_object, and those functions could be made pure cython (in C-equivalent ways as in #18059), i.e. put in a file without having dependencies on src, so we wouldn't have to futz around with include/sources (or in my opinion depends/pxdfiles since cython will take care of that for us).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree though I don't think its a big deal to simply include it. having too fine grain dependencies is a problem too.

"float.".format(type(v)))


cdef _timedelta_value_kwargs(dict kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you pulled these out in the previous PR, is it more clear to put them back (into the constructor)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK either way. It seemed like an appropriate scope for a function, as the __new__ method was a bit ungainly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change things like this now, later propose it first (I don't think its a good idea)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this back and i think can merge

elif PyDelta_Check(other):
ots = Timedelta(other)
else:
ndim = getattr(other, _NDIM_STRING, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of _NDIM_STRING? Here, yes. I think its used in Timestamp too. Not totally sure what the benefit is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this seems kind of silly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this too

cdef _Timedelta td_base

if value is _no_input:
value = _timedelta_value_kwargs(kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. here

@jbrockmendel
Copy link
Member Author

If we can push this through we can port most of Timestamp to its own module. Along with #18086 we can then get the rest.

"float.".format(type(v)))


cdef _timedelta_value_kwargs(dict kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this back and i think can merge

elif PyDelta_Check(other):
ots = Timedelta(other)
else:
ndim = getattr(other, _NDIM_STRING, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this too

@jbrockmendel
Copy link
Member Author

fix this too

Get rid of the matching one in tslib while we're at it?

@@ -1,8 +1,8 @@
from numpy cimport ndarray, int64_t

from tslibs.conversion cimport convert_to_tsobject
from tslibs.timedeltas cimport convert_to_timedelta64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these shouldn’t be here
if something wants to import them they should directly cimport from conversion or timedelta and not tslib

@jreback jreback added this to the 0.22.0 milestone Nov 8, 2017
@jreback jreback merged commit 6b29671 into pandas-dev:master Nov 8, 2017
@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

thanks @jbrockmendel making nice progress!

@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

rebase any prs that touch this code

@jbrockmendel
Copy link
Member Author

Good teamwork; this was a big one.

@jbrockmendel jbrockmendel deleted the tslibs-timedeltas6 branch November 8, 2017 15:11
@@ -13,7 +13,7 @@
from pandas.tseries.frequencies import to_offset, is_subperiod, is_superperiod
from pandas.core.indexes.datetimes import DatetimeIndex, date_range
from pandas.core.indexes.timedeltas import TimedeltaIndex
from pandas.tseries.offsets import DateOffset, Tick, Day, _delta_to_nanoseconds
from pandas.tseries.offsets import DateOffset, Tick, Day, delta_to_nanoseconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.tseries.offsets is a public-ish module. In general I would not add non-underscored internal methods to that namespace like delta_to_nanoseconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, if nothing else delta_to_nanoseconds should be imported directly from tslibs.timedelta. I'll put something together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, do we need to obscure the name if it isn't in __all__?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, tab completion (= for me my interaction with the API) does not take into account __all__ I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants