Implement BaseOffset in tslibs.offsets #18016

jbrockmendel · 2017-10-28T22:08:15Z

This moves a handful of methods of DateOffset up into tslibs.offsets.BaseOffset. The focus for now is on arithmetic methods that do not get overridden by subclasses. These use the self.__class__(..., **self.kwds) pattern that we eventually need to get rid of. Isolating this pattern before suggesting alternatives.

The _BaseOffset class was intended to be a cdef class, but that leads to errors in test_pickle_v0_15_2 that I haven't figured out yet. Once that gets sorted out, we can make DateOffset immutable and see some real speedups via caching.

See other comments in-line.

jbrockmendel · 2017-10-28T22:09:14Z

pandas/_libs/tslibs/offsets.pyx

+    'hours', 'minutes', 'seconds', 'milliseconds', 'microseconds'
+    ])
+
+def _determine_offset(kwds):


At the moment this is a method of DateOffset that only gets called in __init__.

jbrockmendel · 2017-10-28T22:10:00Z

pandas/_libs/tslibs/offsets.pyx

@@ -206,3 +271,109 @@ class ApplyTypeError(TypeError):
 # TODO: unused.  remove?
 class CacheableOffset(object):
    _cacheable = True
+
+
+class BeginMixin(object):


BeginMixin and EndMixin are new, each only have the one method. At the moment these methods are in DateOffset, but they are only used by a small handful of FooBegin and BarEnd subclasses.

jbrockmendel · 2017-10-28T22:11:47Z

pandas/_libs/tslibs/offsets.pyx

+    def __neg__(self):
+        # Note: we are defering directly to __mul__ instead of __rmul__, as
+        # that allows us to use methods that can go in a `cdef class`
+        return self * -1


In the status quo __neg__ is defined as return self.__class__(-self.n, normalize=self.normalize, **self.kwds). By deferring to __mul__, we move away from the self.kwds pattern. Ditto for copy.

jbrockmendel · 2017-10-28T22:15:05Z

pandas/tests/tseries/test_offsets.py

@@ -41,6 +41,8 @@
 from pandas.tseries.holiday import USFederalHolidayCalendar


+data_dir = tm.get_data_path()


Moving this call to up here ensures that we get the same data_dir whether running the tests via pytest or interactively. Under the status quo, copy/pasting the pertinent test below will fail because get_data_path will not behave as expected.

huh? we use this pattern everywhere, why are you changing this?

Because when I try to run these tests interactively and copy/paste the contents of a test function, tm.get_data_path returns unexpected results depending on os.getcwd(). AFAICT when run non-interactively it behaves as if cwd is pandas/tests/tseries.

what do you mean 'interactively'? you should simply be running

pytest pandas/tests/...... -k ... or whatever that is the idiomatic way to run tests.

When a test fails and I want to figure out why, I run the contents of the test manually in the REPL.

Happy to revert this change; not that big a deal.

yes pls revert.

standard way to run tests is

pytest path/to/test -k optional_regex

lots of options, including --pdb to drop into the debuger

pls revert this is non-standard

codecov · 2017-10-29T05:02:26Z

Codecov Report

Merging #18016 into master will decrease coverage by 0.02%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master   #18016      +/-   ##
==========================================
- Coverage   91.23%   91.21%   -0.03%     
==========================================
  Files         163      163              
  Lines       50091    50032      -59     
==========================================
- Hits        45703    45636      -67     
- Misses       4388     4396       +8

Flag	Coverage Δ
#multiple	`89.02% <93.75%> (-0.02%)`	⬇️
#single	`40.22% <93.75%> (-0.08%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/offsets.py	`97.11% <93.75%> (-0.05%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2d0d1b...8eb131e. Read the comment docs.

codecov · 2017-10-29T05:02:28Z

Codecov Report

Merging #18016 into master will increase coverage by 0.12%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18016      +/-   ##
==========================================
+ Coverage   91.28%   91.41%   +0.12%     
==========================================
  Files         163      163              
  Lines       50130    50073      -57     
==========================================
+ Hits        45761    45772      +11     
+ Misses       4369     4301      -68

Flag	Coverage Δ
#multiple	`89.21% <100%> (+0.14%)`	⬆️
#single	`40.32% <100%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/offsets.py	`97.11% <100%> (-0.05%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.05%)`	⬇️
pandas/core/internals.py	`94.54% <0%> (+0.07%)`	⬆️
pandas/io/formats/format.py	`96.01% <0%> (+0.07%)`	⬆️
pandas/core/panel.py	`97.28% <0%> (+0.28%)`	⬆️
pandas/core/common.py	`93% <0%> (+1.82%)`	⬆️
pandas/core/generic.py	`95.72% <0%> (+3.3%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2d0eed...61bf134. Read the comment docs.

jbrockmendel · 2017-10-29T05:51:25Z

taskset 4 asv continuous -E virtualenv -f 1.1 master HEAD -b timeseries
[...]
        before           after         ratio
     [b2d0d1bf]       [a728d995]
-     56.4±0.07μs       50.7±0.3μs     0.90  timeseries.SemiMonthOffset.time_begin_decr
-       152±0.7ms        135±0.5ms     0.89  timeseries.ToDatetime.time_iso8601_tz_spaceformat
-     1.79±0.01ms      1.54±0.01ms     0.86  frame_methods.frame_assign_timeseries_index.time_frame_assign_timeseries_index

…libs-offsets3

jreback · 2017-10-29T20:04:55Z

pandas/_libs/tslibs/offsets.pyx

+# ---------------------------------------------------------------------
+# Base Classes
+
+class _BaseOffset(object):


why are you creating a base class here? what is the purpose?

IOW why not simply have 1 Base class (and not a _BaseOffset and a BaseOffset)

See comments about remaining cython/pickle issues.

You're absolutely right that in its current form having two separate classes accomplishes nothing. The idea is that _BaseOffset should be a cdef class, while BaseOffset should be python class. (__rfoo__ methods do not play nicely with cython classes).

ok that is fine.

I would probably leave this as a class for the moment. I am not convinced this actually needs to be a full c-extension class (e.g. its not like we are inheriting from a python c-class here). I don't see the benefit and it has added complexity.

The main reason is to achieve immutability. That's the big roadblock between us and making __eq__, __ne__, __hash__ performance not-awful. (There's an issue somewhere about "scalar types immutable" or something like that)

jreback · 2017-10-29T20:05:36Z

pandas/tests/tseries/test_offsets.py

@@ -41,6 +41,8 @@
 from pandas.tseries.holiday import USFederalHolidayCalendar


+data_dir = tm.get_data_path()


huh? we use this pattern everywhere, why are you changing this?

jreback · 2017-10-29T20:06:12Z

pandas/tseries/offsets.py

    @classmethod
    def _from_name(cls, suffix=None):
        # default _from_name calls cls with no args
        if suffix:
-            raise ValueError("Bad freq suffix {suffix}".format(suffix=suffix))
+            raise ValueError("Bad freq suffix %s" % suffix)


revert, we are moving towards new style string formatting

Woops, copy/paste from an older version. Will revert.

jreback · 2017-10-29T21:39:53Z

pandas/_libs/tslibs/offsets.pyx

+    def _should_cache(self):
+        return self.isAnchored() and self._cacheable
+
+    def __repr__(self):


side note, the repr is currently used for hashing, but instead should simply define __hash__ I think.

__hash__ is defined using _params() which is the god-awful slow thing we need to get rid of.

…libs-offsets3

jreback · 2017-11-03T00:01:08Z

small comments, and rebase

…libs-offsets3

jbrockmendel · 2017-11-07T01:20:03Z

@jreback For triaging purposes, this is the only one of my PRs that is blocking non-refactoring work.

jreback · 2017-11-07T13:16:55Z

pandas/_libs/tslibs/offsets.pyx

 from pandas._libs.tslib import pydt_to_i8

+from frequencies cimport get_freq_code


update setup.py for this

jreback · 2017-11-07T17:42:03Z

lgtm ping on green

jbrockmendel · 2017-11-07T18:43:11Z

TestClipboard.test_round_trip_valid_encodings, otherwise green. Will push a dummy commit anyway.

…libs-offsets3

jbrockmendel · 2017-11-08T00:14:12Z

Ping

jreback · 2017-11-08T02:59:35Z

thanks!

Implement BaseOffset in tslibs.offsets

a728d99

jbrockmendel commented Oct 28, 2017

View reviewed changes

whitespace fixup

8eb131e

jbrockmendel added 2 commits October 28, 2017 23:11

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

4a4ef1c

…libs-offsets3

whitespace fixup

6c7db0a

jreback requested changes Oct 29, 2017

View reviewed changes

revert to format string

20f2d8b

jreback added Frequency DateOffsets Internals Related to non-user accessible pandas implementation labels Oct 29, 2017

jreback reviewed Oct 29, 2017

View reviewed changes

jbrockmendel added 3 commits November 1, 2017 08:34

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

8da5a84

…libs-offsets3

flake8 fixup

58ffc7c

dummy commit to forc CI

7945386

jbrockmendel mentioned this pull request Nov 1, 2017

Separate _TSObject into conversion #18060

Merged

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

83d26fa

…libs-offsets3

jbrockmendel added 3 commits November 2, 2017 17:10

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

b281093

…libs-offsets3

revert nonstandard use of get_data_path

db38ef2

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

673aa29

…libs-offsets3

jbrockmendel mentioned this pull request Nov 6, 2017

Remove out-of-date numpy.pxd; remove unused float16_t #18101

Closed

4 tasks

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

74e1eed

…libs-offsets3

jreback requested changes Nov 7, 2017

View reviewed changes

add dep

93f9a75

jreback added this to the 0.22.0 milestone Nov 7, 2017

jreback approved these changes Nov 7, 2017

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

61bf134

…libs-offsets3

jreback merged commit d3d60f8 into pandas-dev:master Nov 8, 2017

watercrossing pushed a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017

Implement BaseOffset in tslibs.offsets (pandas-dev#18016)

f6d28c3

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Implement BaseOffset in tslibs.offsets (pandas-dev#18016)

fdc662f

jbrockmendel deleted the tslibs-offsets3 branch December 8, 2017 19:41

		@@ -41,6 +41,8 @@
		from pandas.tseries.holiday import USFederalHolidayCalendar


		data_dir = tm.get_data_path()

		from pandas._libs.tslib import pydt_to_i8

		from frequencies cimport get_freq_code

Uh oh!

Implement BaseOffset in tslibs.offsets #18016

Implement BaseOffset in tslibs.offsets #18016

Uh oh!

Conversation

jbrockmendel commented Oct 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 29, 2017

Codecov Report

Uh oh!

codecov bot commented Oct 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jbrockmendel commented Oct 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 3, 2017

Uh oh!

jbrockmendel commented Nov 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 7, 2017

Uh oh!

jbrockmendel commented Nov 7, 2017

Uh oh!

jbrockmendel commented Nov 8, 2017

Uh oh!

jreback commented Nov 8, 2017

Uh oh!

Uh oh!

codecov bot commented Oct 29, 2017 •

edited

Loading

jbrockmendel Oct 29, 2017 •

edited

Loading