Skip to content

BUG: Period and period_range behaviour is inconsistent. #47622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
jaheba opened this issue Jul 7, 2022 · 0 comments
Open
3 tasks done

BUG: Period and period_range behaviour is inconsistent. #47622

jaheba opened this issue Jul 7, 2022 · 0 comments
Labels
Bug Period Period data type

Comments

@jaheba
Copy link

jaheba commented Jul 7, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas._libs.tslibs import to_offset

# 1
# Freq argument is ignored when using different multiple
hourly = to_offset("H")
p = pd.Period("2020-01-01", freq="24H")
assert pd.Period(p, hourly).freq == to_offset("24H")


# 2
# asfreq shifts value, even when using same frequency
p = pd.Period("2020-01-01", freq="24H")
assert p != p.asfreq(p.freq)

# also, consider this example

dr = pd.date_range("2020", freq="2d", periods=3)

s1 = dr.to_series().asfreq(dr.freq).to_period()
s2 = dr.to_series().to_period().asfreq(dr.freq)

# one would expect s1 and s2 to be the same, but of course not!
>>> s1
2020-01-01   2020-01-01
2020-01-03   2020-01-03
2020-01-05   2020-01-05
Freq: 2D, dtype: datetime64[ns]

>>> s2
2020-01-02   2020-01-01
2020-01-04   2020-01-03
2020-01-06   2020-01-05
Freq: 2D, dtype: datetime64[ns]


# 3
# When providing two periods in period_range, only start of end is taken into consideration
pr = pd.period_range(pd.Period("2020-01-01 00:00", "6H"), pd.Period("2020-01-01 18:00", "6H"), freq="H")
pr[0] == "2020-01-01 0:00"
pr[-1] == "2020-01-01 18:00" # why not 23:00?
len(pr) == 19

# which of course is inconsistent with
pr = pd.period_range(pd.Period("2020Q1", "Q"), pd.Period("2020Q2", "Q"), freq="M")
pr[0] == "2020-03" # why not 2020-01?
pr[-1] == "2020-06"

# which then again behaves differently from
dr = pd.date_range(pd.Timestamp("2020Q1", "Q"), pd.Timestamp("2020Q2", "Q"), freq="M")
dr[0] == "2020-01-31"
dr[-1] == "'2020-03-31'

Issue Description

The behaviour of Period and period_range is just very surprising and inconsistent.

Is is inconsistent in itself but also when comparing period_range with date_range.

See also: #47465

Expected Behavior

I naively would expect that a Period represents a time-range. There is a start where the period begins and an end where it ends:

p = pd.Period("2020-01-01", "2d")

Here p represents everything on the first two days of 2020.

If I use period_range, I would expect it to take the entire range of start and end into account:

start = p
end = p + 1
pr = pd.period_range(start, end, freq="2D")
assert pr[-1].end_time == end.end_time

So far so good. Let's try a different frequency:

pr2 = pd.period_range(start, end, freq="D")
assert pr2[-1].end_time == end.end_time # Fails

How naive of me! Of course the second argument is neither inclusive nor exclusive when generating the range, but a happy mix of both:

pr2[-1] == end.asfreq("D", "S") # note how neither using Period nor .asfreq("D") would work

The new range includes everything from start.start_time until pd.Period(end.start_time, "D").end_time just one would expect.

The rules are clear now.

So let's just try a different example.

start = pd.Period("2020Q1", "Q")
end = pd.Period("2020Q2", "Q")

pr = pd.period_range(start, end, freq="M")

We know: The start of pr should be start.start_time and end should be pd.Period(end.start_time, "D").end_time:

pr3[0] == "2020-03"
pr3[-1] == "2020-06"

🤯

@jaheba jaheba added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2022
@jbrockmendel jbrockmendel added the Period Period data type label Nov 1, 2023
@mroeschke mroeschke removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type
Projects
None yet
Development

No branches or pull requests

3 participants