Skip to content

Cleaning yearly, quarterly and monthly date offsets #39878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 36 commits into from

Conversation

Pawel-Kranzberg
Copy link
Contributor

@Pawel-Kranzberg Pawel-Kranzberg commented Feb 18, 2021

  • Ability to use YearBegin and QuarterBegin DateOffset classes as Period freq, along with offset aliases.

  • Ability to use three-letter month short names as month / startingMonth arguments of year- / quarter-based offset classes (as already used in their _from_name class method).

  • Expanded docstrings with, e.g., additional examples.

  • Partial refactorization, as a step toward combining QuarterOffset and YearOffset __init__ (mentioned as TO DO in the code).

  • xref ENH: Implement '_period_dtype_code' attribute to 'pandas._libs.tslibs.offsets.MonthBegin' #38859

  • tests added / passed

  • Ensure all linting tests pass, see here for how to run them

  • whatsnew entry

@jreback
Copy link
Contributor

jreback commented Feb 18, 2021

@Pawel-Kranzberg you don't need multiple PR
pls use the same one and just push new commits


cdef readonly:
int month
cdef readonly int month
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we generally use the patternw tih only a single cdef

cdef:
    public int _period_dtype_code
    readonly int month
    [...]

Copy link
Contributor Author

@Pawel-Kranzberg Pawel-Kranzberg Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel - Thanks, I've just fixed that.
The problem with the failing tests stems from the change of the default month for BQuarterBegin and QuarterBegin classes, from 3 (March) to 1 (January). I believe the old values to be wrong, leading to behaviours that users might not expect, and inconsistency vs. BQuarterEnd and QuarterEnd. I've for instance expected the first quarter to begin with January and end in March (that's already the case for BQuarterEnd and QuarterEnd), and not begin in March and end in May (the legacy BQuarterBegin and QuarterBegin do just that).
However, the current tests expect the legacy (IMO incorrect) quarter spans. How should I go about it?

Minor formatting changes.
Minor formatting changes.
Temporary unfixing of the default months for `QuarterBegin` and `BQuarterBegin`.
@jreback
Copy link
Contributor

jreback commented Feb 18, 2021

if you are just refactoring then all tests should pass

please do any breaking change after this PR - it's very hard to tell what you are doing if these are conflated

@Pawel-Kranzberg
Copy link
Contributor Author

@jreback - It's hard for me to do just refactoring when the final test is failing due to the class QuarterBegin having different default values depending on the creation method. See:

cdef class QuarterBegin(QuarterOffset):
    """
    DateOffset increments between Quarter start dates.
    startingMonth = 1 corresponds to dates like 1/01/2007, 4/01/2007, ...
    startingMonth = 2 corresponds to dates like 2/01/2007, 5/01/2007, ...
    startingMonth = 3 corresponds to dates like 3/01/2007, 6/01/2007, ...
    """
    _default_starting_month = 3
    _from_name_starting_month = 1
    _prefix = "QS"
    _day_opt = "start"

The test is as follows:

        sdate = datetime(1999, 12, 25)
        edate = datetime(2000, 1, 1)

        idx1 = date_range(start=sdate, end=edate, freq="QS")
        idx2 = date_range(
            start=sdate, end=edate, freq=offsets.QuarterBegin(startingMonth=1)
        )
>       assert len(idx1) == len(idx2)

Please advise me how to proceed. Should I, e.g., now create and process a new PR with only the default month changes? Or roll back refactoring of Quarter classes in the current PR?

@jreback
Copy link
Contributor

jreback commented Feb 18, 2021

pls don't do refactoring and a bug fix in the same PR
one or the other or this will be impossible to merge

@jreback
Copy link
Contributor

jreback commented Feb 18, 2021

be honest i don't know which bug this is supposedly fixing

@Pawel-Kranzberg
Copy link
Contributor Author

be honest i don't know which bug this is supposedly fixing

@jreback - OK, I'd say it is a valid bug that a QuarterBegin object has different starting months, depending on if it was created using:

  • the offset alias. e.g., in date_range(start=sdate, end=edate, freq="QS") , with the resulting quarter having January as the starting month (_from_name_starting_month = 1), or
  • directly, e.g. in date_range(start=sdate, end=edate, freq=offsets.QuarterBegin(), with March as the starting month (_default_starting_month = 3).

In addition I'd argue that default starting months for QuarterBegin and BQuarterBegin should be 1 instead of the current 3.

@Pawel-Kranzberg
Copy link
Contributor Author

Other breaking changes that I introduced allow using year, quarter and month starts, as well as business quarter and month starts, as freq in date_range.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2021

pls search the tracker pretty sure there are existing issues for this

we like targeted PRs
PRa that do too many things won't be merged

@Pawel-Kranzberg
Copy link
Contributor Author

Pawel-Kranzberg commented Feb 18, 2021

@jreback - Thank you for all the guidance and patience. This PR is now narrowed down to the following:

  • Ability to use all year- and quarter-based offset classes as Period freq, along with offset aliases.
  • Ability to use three-letter month short names as month / startingMonth arguments of year- / quarter-based offset classes (as already used in their _from_name class method).
  • Expanded docstrings with, e.g., additional examples.
  • Partial refactorization, as a step toward combining QuarterOffset and YearOffset __init__ (mentioned as TO DO in the code).

Is that scope acceptable?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests for construction from the str & anything else that is an enhancement. I am not exactly sure where this is useful. maybe some tests would show. Please also craft a whatsnew that shows from a user POV what has changed.

Timestamp('2021-01-01 05:01:15')
>>> ts + YearBegin(normalize = True)
Timestamp('2021-01-01 00:00:00')
>>> ts + YearBegin()._from_name('APR')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't show internal methods

Timestamp('2020-11-30 05:01:15')
>>> ts + YearEnd(normalize = True)
Timestamp('2020-12-31 00:00:00')
>>> ts + YearEnd()._from_name('AUG')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback added Frequency DateOffsets Refactor Internal refactoring of code labels Feb 19, 2021
@Pawel-Kranzberg Pawel-Kranzberg marked this pull request as draft February 20, 2021 09:37
@jreback
Copy link
Contributor

jreback commented Feb 21, 2021

@Pawel-Kranzberg I am still not sure what this PR is doing. seems you updated some doc-string & added some new fucntionaility which is not tested.

int month
cdef:
readonly int month, _period_dtype_code
# public int _period_dtype_code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls dont leave commented-out code


# _default_month: int # FIXME: python annotation here breaks things
_attributes = tuple(["n", "normalize", "month"])
_default_month = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this cant be annotated, then please leave the comment

self.month = month
if month is not None:
if isinstance(month, str):
self.month = MONTH_TO_CAL_NUM[month]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is adding support for passing a month string? thats not necessarily a bad idea, but would need to check perf impact

@@ -1903,30 +1913,44 @@ cdef class YearOffset(SingleConstructorOffset):
return shifted


cdef class BYearEnd(YearOffset):
cdef class YearBegin(YearOffset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you re-ordering the classes in this file? or is git just mangling the diff?

@jbrockmendel
Copy link
Member

pls start by reverting the re-ordering of classes. that will make it much easier to see what is actually changed

@jbrockmendel
Copy link
Member

@Pawel-Kranzberg can you address comments and merge master

@Pawel-Kranzberg
Copy link
Contributor Author

@Pawel-Kranzberg can you address comments and merge master

Yes, I will. I'm sorry, I've been very busy lately.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 30, 2021
@Pawel-Kranzberg
Copy link
Contributor Author

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I need it for now as a reference.

@simonjayhawkins
Copy link
Member

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I need it for now as a reference.

you can still reference closed PRs. will close this one as you have several open that are stale. ping when ready to continue and will reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Refactor Internal refactoring of code Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants