Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate _TSObject into conversion #18060

Merged
merged 10 commits into from
Nov 2, 2017

Conversation

jbrockmendel
Copy link
Member

This is what we've been building towards folks. convert_to_tsobject, convert_str_to_tsobject, and convert_datetime_to_tsobject are moved to tslibs.conversion. Concerns regarding _TSObjectss can be considered separated.

This isn't quite cut/paste:

  • convert_to_tsobject calls is_timestamp which is not available in the namespace. Similarly _localize_pydatetime checks isinstance(dt, Timestamp). So this re-implements private versions:
  • conversion._is_timestamp checks that its input has type datetime. IIUC correctly this check should be as performant as tslib.is_timestamp. (moderate sized "if")
  • conversion._localize_pydatetime includes small optimizations that are available because we know the context in which it is being called. Specifically, 1) it can skip the Timestamp case since any nanos get dropped in pydatetime_to_dt64, and 2) the tz arg is typed as a tzinfo object, 3) it skips unnecessary None check.

Other notes:

  • _get_datetime64_nanos is called by convert_to_tsobject, so also had to be moved. It fits well with the "conversion" theme.

@codecov
Copy link

codecov bot commented Nov 1, 2017

Codecov Report

Merging #18060 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18060      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50115    50115              
==========================================
- Hits        45730    45721       -9     
- Misses       4385     4394       +9
Flag Coverage Δ
#multiple 89.04% <ø> (ø) ⬆️
#single 40.23% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 881ee30...3e8b102. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 1, 2017

Codecov Report

Merging #18060 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18060      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45728       -9     
- Misses       4383     4392       +9
Flag Coverage Δ
#multiple 89.04% <ø> (ø) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fd3bd7...c337c53. Read the comment docs.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Nov 1, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2017

+1 for refactoring and logic decoupling, but let's make sure performance didn't take a hit by accident (FYI, if you do plan on doing anymore of these types of PR's, I would suggest you provide that from the get go).



# TODO: We can type the input as a np.datetime64 right?
# and the output as an int64_t?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a doc-string

@@ -48,13 +91,227 @@ cdef class _TSObject:
return self.value


# helper to extract datetime and int64 from several different possibilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would rather you integrate the comments with the doc-strings; in this case your comment is not useful as the doc-string is pretty good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal, but the comment was cut/pasted from the original. Will change.

cdef _TSObject convert_str_to_tsobject(object ts, object tz, object unit,
bint dayfirst=False,
bint yearfirst=False):
""" ts must be a string """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update doc-string

if unit != PANDAS_FR_ns:
pandas_datetime_to_datetimestruct(ival, unit, &dts)
check_dts_bounds(&dts)
return dtstruct_to_dt64(&dts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    ival = dstruct_to_dt64(&dts)

return ival

# ----------------------------------------------------------------------
# Misc Helpers

cdef inline bint _is_timestamp(datetime obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to make it clear

is_instance_timestamp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tihs is *onlyI used in that 1 place I would just in-line it directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tihs is *onlyI used in that 1 place I would just in-line it directly.

Makes sense.

Just realized: recall recently there was an issue with Timestamp classmethods returning Timestamp instead of cls instances. The conclusion was that this could not be fixed because is_timestamp would break on subclasses. If this implementation is in fact equally performant, that would solve the problem.

offset = get_utcoffset(obj.tzinfo, ts)
obj.value -= int(offset.total_seconds() * 1e9)

if _is_timestamp(ts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment here

check_dts_bounds(&dts)
return dtstruct_to_dt64(&dts)
else:
return ival
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-privatize

note I am not only suggesting these things for general cleanliness, but also because these prevent accidental imports; IOW things will immediately break because of the new name unless they are fixed

write decent docstrings

de-private _get_datetime64_nanos

also, removed unnecessary unit input from convert_str_to_tsobject
@jbrockmendel
Copy link
Member Author

jbrockmendel commented Nov 1, 2017

Appveyor error looks like it can't import scipy; saw the same error in #18016 a few minutes ago.

Not sure if its related, but trying to run asv it keeps timing out, stalling at the "Creating environments" step. update not related

@jbrockmendel
Copy link
Member Author

Suggestions for the scipy import error on appveyor?

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2017

@jreback : This looks like a scipy issue to me, as I'm seeing that in my PR as well.

@jbrockmendel
Copy link
Member Author

Forgot to use affinity on the first run here:

asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
      before           after         ratio
     [881ee30b]       [b129a9e1]
+     8.95±0.06μs      11.0±0.07μs     1.23  timeseries.Offsets.time_timeseries_year_apply
+       163±0.2ms        187±0.4ms     1.15  timeseries.ToDatetime.time_iso8601_tz_spaceformat
-      34.5±0.1μs      31.1±0.07μs     0.90  timeseries.Offsets.time_custom_bday_cal_decr
-      59.1±0.4μs      53.1±0.04μs     0.90  timeseries.SemiMonthOffset.time_end_incr
-           1.51s          405±9ms     0.27  timeseries.AsOfDataFrame.time_asof
-           2.00s        421±0.8ms     0.21  timeseries.AsOfDataFrame.time_asof_nan

Note master got updated to a new commit between runs here.

taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [15fa4bd6]       [b129a9e1]
+       96.1±30ms            154ms     1.61  timeseries.DatetimeIndex.time_dti_tz_factorize
+           1.71s            2.06s     1.20  timeseries.AsOfDataFrame.time_asof_nan
-           2.17s       1.95±0.03s     0.90  timeseries.AsOfDataFrame.time_asof
-      29.0±0.1μs       24.8±0.1μs     0.86  timeseries.AsOf.time_asof_single
-          51.7ms       8.82±0.4ms     0.17  timeseries.DatetimeIndex.time_dti_factorize

taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
      before           after         ratio
     [15fa4bd6]       [b129a9e1]
+        25.5±2ms         36.0±2ms     1.41  timeseries.DatetimeIndex.time_dti_tz_factorize
-      25.6±0.2μs       22.3±0.2μs     0.87  timeseries.Offsets.time_timeseries_day_apply
-     4.08±0.01ms      3.41±0.01ms     0.83  timeseries.ToDatetime.time_iso8601
-        509±50ms          413±1ms     0.81  timeseries.DatetimeIndex.time_add_offset_slow

taskset 6 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [15fa4bd6]       [b129a9e1]
+      11.9±0.2μs       14.8±0.3μs     1.24  timeseries.Offsets.time_timeseries_year_incr
+       173±0.4ms          198±5ms     1.15  timeseries.ToDatetime.time_iso8601_tz_spaceformat
+     7.15±0.05ms       8.04±0.4ms     1.12  timeseries.DatetimeIndex.time_add_offset_fast
+      49.0±0.4μs         54.3±1μs     1.11  timeseries.SemiMonthOffset.time_begin_apply
-        539±20μs        477±0.4μs     0.89  timeseries.DatetimeIndex.time_reset_index_tz
-      444±0.01μs        387±0.3μs     0.87  timeseries.DatetimeIndex.time_reset_index
-      22.8±0.2μs      19.7±0.09μs     0.87  timeseries.Offsets.time_custom_bday_apply_dt64
-      2.21±0.2ms         1.72±0ms     0.78  timeseries.DatetimeIndex.time_add_offset_delta

Given the excess variability caused by my local entropy field, this looks about unchanged.

@jreback
Copy link
Contributor

jreback commented Nov 2, 2017

looks good. pls rebase

@jreback jreback added this to the 0.22.0 milestone Nov 2, 2017
@jreback jreback merged commit 427d283 into pandas-dev:master Nov 2, 2017
@jreback
Copy link
Contributor

jreback commented Nov 2, 2017

thanks!

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Nov 3, 2017
1kastner pushed a commit to 1kastner/pandas that referenced this pull request Nov 5, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
@jbrockmendel jbrockmendel deleted the tslibs-conversion10 branch December 8, 2017 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants