Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateOffset.offset vs DateOffset._offset #17176

Closed
jbrockmendel opened this issue Aug 5, 2017 · 5 comments
Closed

DateOffset.offset vs DateOffset._offset #17176

jbrockmendel opened this issue Aug 5, 2017 · 5 comments
Labels
Frequency DateOffsets

Comments

@jbrockmendel
Copy link
Member

tseries.offsets.DateOffset has a _offset attribute. Many subclasses do not have this attribute, but do have a offset attribute. The main difference in the two appears to be that _offset is ignored by __eq__ (see #17137).

The question: why two separate attributes? Is this intentional or coincidence?

@gfyoung gfyoung added the Internals Related to non-user accessible pandas implementation label Aug 5, 2017
@jbrockmendel
Copy link
Member Author

There are some inconsistencies in how offsets handles kwds that need to be cleaned up if we want offsets to be immutable. I have not found a way to clean them up that maintains backward compatibility in extremely weird corner cases. Take BusinessDay as an example:

class BusinessDay(SingleConstructorOffset):
    def __init__(self, n=1, normalize=False, **kwds):
        self.n = int(n)
        self.normalize = normalize
        self.kwds = kwds
        self.offset = kwds.get('offset', timedelta(0))

I'd like to get a green-light to change this behavior:

In [1]: from datetime import timedelta
In [2]: from pandas.tseries.offsets import *
In [3]: bd = BusinessDay()
In [4]: bd2 = BusinessDay(offset=timedelta(0))

In [5]: bd
Out[5]: <BusinessDay>

In [6]: bd == bd2
Out[6]: True

In [7]: bd.kwds
Out[7]: {}

In [8]: bd2.kwds
Out[8]: {'offset': datetime.timedelta(0)}

In [9]: bd - bd2
Out[9]: <0 * BusinessDays>

In [10]: bd2 - bd
Out[10]: <0 * BusinessDays>

In [11]: (bd - bd2).kwds
Out[11]: {}

In [12]: (bd2 - bd).kwds
Out[12]: {'offset': datetime.timedelta(0)}

Similar behavior shows up in the other BusinessFooOffsets, Week, and QuarterOffset. Many of the others will display unwanted behavior if users pass unsupported keyword args:

In [33]: sms = SemiMonthBegin(clark='kent')
In [34]: sms
Out[34]: <SemiMonthBegin: day_of_month=15, kwds={'clark': 'kent'}>

Best case I'd like to get rid of DateOffset.kwds altogether and explicitly specify the args/kwargs in the various __init__ methods (this will require patching a few other methods that check self.kwds, but that isn't a big deal). That will break backward compat for any users who-- for some reason-- explicitly use that attribute.

If that is too big a break, the next-best option would be to

  1. add validation in __init__ that only supported kwargs have been passed (this is done in a few existing code paths). This will take care of the SemiMonthBegin(clark='kent') cases.

  2. ensure that default values for kwds are handled consistently. e.g. in the BusinessDay.__init__ example above, kwds.get('offset', timedelta(0)) would be changed to kwds.setdefault('offset', timedelta(0)).

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

I agree that the whole kwds bizness is fishy. So would be ok with having actual signatures.

@jreback jreback added Frequency DateOffsets and removed Internals Related to non-user accessible pandas implementation labels Sep 7, 2017
@jbrockmendel
Copy link
Member Author

I agree that the whole kwds bizness is fishy. So would be ok with having actual signatures.

Great. I'll start small implementing these one or two at a time to make sure we're on the same page.

@jbrockmendel
Copy link
Member Author

There are two places in the tests with seemingly-incorrect behavior. I'd like to confirm that these are incorrect and fix the tests before trying to fix the code.

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/tseries/test_offsets.py#L2632

dt = datetime(2007, 1, 1)
offset = MonthEnd(day=20)
result = dt + offset
assert result == Timestamp(2007, 1, 31)
result = result + offset
assert result == Timestamp(2007, 2, 28)

day is not a valid argument for MonthEnd; this offset behaves the same as one with no day arg.

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/tseries/test_offsets.py#L3680

offset_n = FY5253(weekday=WeekDay.TUE, startingMonth=12,
                          variation="nearest", qtr_with_extra_week=4)

qtr_with_extra_week is a kwd for FY5253Quarter, not FY5253

@jreback
Copy link
Contributor

jreback commented Sep 9, 2017

yep those look wrong

why don't do a PR just to fix those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

3 participants