Skip to content

Pandas compatibility of Timezone / Timezone Info #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
frensjan opened this issue Jul 6, 2017 · 6 comments
Open

Pandas compatibility of Timezone / Timezone Info #131

frensjan opened this issue Jul 6, 2017 · 6 comments

Comments

@frensjan
Copy link

frensjan commented Jul 6, 2017

I think there is an incompatibility between pandas and pendulum.tz.timezone.Timezone. I'm not sure who is to blame though, but as from pendulum.tz.timezone_info.TimezoneInfo it seemed reasonable to check here first.

Two examples of unexpected behaviour; the programs exit with a segmentation fault:

from datetime import datetime, timedelta

from pendulum.tz.timezone import UTCTimezone
from pendulum.tz.timezone_info import UTC
import pandas as pd
import pendulum

This program creates two date time ranges, but printing (creating a string representation) fails for the second:

tomorrow = pendulum.tomorrow()

print(pd.date_range(today.astimezone('UTC'), tomorrow.astimezone('UTC'), freq='1H'))
print(pd.date_range(today.astimezone('CET'), tomorrow.astimezone('CET'), freq='1H'))  # seg faults

In this example using UTCTimezone causes the segmentation fault:

tomorrow = datetime.now() + timedelta(hours=24)
now = datetime.now()

r = pd.date_range(now, tomorrow, freq='1H')
r.tz = UTC
print(r[0])
r.tz = UTCTimezone
print(r[0]) # seg faults

I recon things go wrong when boxing in pandas. This is in C / Cython territory, so lot harder to debug for me.

Anyone ideas on the matter? I'd say Timezone probably behaves a bit different from TimezoneInfo but I really don't know where to start. I haven't even found their respective use cases yet.

Versions used:

  • pendulum==1.2.4
  • pandas==0.20.2
  • numpy==1.13.0

I'm running on Fedora Linux.

Thanks in advance!

@sdispater
Copy link
Collaborator

I investigated a bit and it seems that the segfault occurs when a call to Timestamp.replace() is made here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/datetimes.py#L484

Behind the scene, pandas will call:

int(total_seconds(_get_utcoffset(tz, None)))

in the _get_dst_info() function.

However total_seconds() assumes a timedelta while the utcoffset() method of a tzinfo can also return None.

Here is what the offical Python documentation says (https://docs.python.org/3.6/library/datetime.html#datetime.tzinfo.utcoffset):

If the UTC offset isn’t known, return None.

And in this case None is being passed to utcoffset() so there is no way to determine the offset that's why pendulum returns None.

I am not sure how to approach this, but it seems to me that pandas should handle this special case since it's a documented one.

@frensjan
Copy link
Author

Agreed that pandas should cover the return of None. At least not segfault ;) I've created an issue there. Thanks! Feel free to chip in over there :)

It did find it surprising that pendulum.tz.timezone.UTCTimezone and pendulum.tz.timezone_info.UTC behave differently. But I'll close this issue for now.

@frensjan frensjan reopened this Jul 12, 2017
@frensjan
Copy link
Author

Apparently this is a known issue at pandas but they won't fix it. Any suggestions?

@pganssle
Copy link
Contributor

I think the main issue here is that pandas has some pretty heavy tight coupling with pytz and dateutil. They use all kinds of private methods in the dateutil time zone API to do what they are doing.

For fixed zones like UTC, I don't see that there's any harm in fast-tracking utcoffset() to return datetime.timedelta(0) (or equivalent), but for zones with DST, I think it's a pretty big overhaul for them to support generalized tzinfo objects. I imagine if you made a PR fixing it such that it didn't impact the performance of pytz and dateutil zones they'd merge it, but I doubt they'll prioritize this.

@matthiasLevy
Copy link

matthiasLevy commented Jan 18, 2018

Hello !
Does anyone have found a workaround ?
Because the problems seems wider than just a problem in UTC. I just tried

pend = pendulum.now()
ts = pandas.Timestamp(pend)
ts.ceil('T')

and it crashes, whatever the timezone of pend is...
Is there a direct way to get ceil(freq)and floor(freq)of a Pendulum?
Thanks for your job anyway.


EDIT
I tried
ts = pd.Timestamp(pend, tz = pytz.tzinfo.tzinfo(pend.tz))
but still face the same issue : NotImplementedError: a tzinfo subclass must implement utcoffset()
I had to go back to UTC
ts = pandas.Timestamp(pend.timezone_('UTC'), tz='UTC')

ddelange added a commit to ddelange/pendulum that referenced this issue Jul 16, 2020
Fixes pandas-dev/pandas#15986
Closes python-pendulum#131

As explained by @sdispater in the issues above, pandas tries to access the Timezone._utcoffset property: https://github.com/pandas-dev/pandas/blob/v1.0.5/pandas/_libs/tslibs/timezones.pyx#L153

This commit allows pandas to get _utcoffset from a Timezone object, taking the transision as seen from utcnow.
@rmaunder
Copy link

rmaunder commented Jan 4, 2021

Hi

I just came across this issue, and the related Pandas one: pandas-dev/pandas#15986
after several days of frustration / confusion.

It seems that Pendulum is not 100% compatible with Pandas and some of the assumptions it makes about datetimes. I'm likely to now drop using it and revert to native datetime + pytz - which is a shame but I don't see a practical way round it given I use time indexed dataframes extensively.

I appreciate it can argued that the fault lies with Pandas and assumptions made there, but given that Pandas DataFrame / DateTimeIndexes are one of the main uses of Python Datetimes, it seems a bit disingenuous to present Pendulum as a drop-in replacement when these issues still exist (regardless of 'blame').

At the very least this should be highlighted in the Limitations section of the documentation so people are aware of this before they make the investment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants