-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add dateutil timezone support (GH4688) #4689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The 2.6 and 3.2 and 3.3 builds are failing. I'll reproduce these locally and update a fix. |
Why would we want to make this change? Is there some additional functionality or flexibility we get from using dateutil's timezone support? (For example, if this removed a dependency on pytz, that could be a net benefit). A few other items that would be helpful:
Finally, if this is supposed to remove a pytz dependency, can you try removing the pytz imports and seeing if this will work without it (and therefore confirm that timezones aren't getting changed under the hood - maybe you're already doing this). |
also pandas would like to possibly remove the dateutil dep in the future (which will take some work); so pytz is the preferred backend |
One problem is that dateutil is used widely in production so eliminating its use may be hard / not worth it. But not completely clear to me. |
I added some rationale issue 4688 here but the intent was not to remove @jtratner I'll do the performance test and add some doc. on the comparison between |
can you show an example of the above? how a Timestamp cannot be converted to another time zone? |
Sure, in the case of >>> import pandas as pd
>>> import dateutil
>>> ts = pd.Timestamp('2013-08-30 12:00', tz=dateutil.tz.gettz('US/Eastern'))
>>> ts.astimezone(tz=dateutil.tz.gettz('US/Pacific'))
Exception AttributeError: "'NoneType' object has no attribute 'toordinal'" in 'pandas.tslib._localize_tso' ignored
<Timestamp: 2013-08-30 16:00:00> The converted Timestamp is naive and has the UTC value. |
it seems that this could easily be solved by always using a stringified tz (instead of the object directly), which would then use pytz internally |
Agreed. Indeed that was option (b) I identified here: #4688 (comment) However there is some performance impact in that there is a (small) cost associated with creating a new However the intent of my change was to allow users to use Pandas with either library as some people prefer |
can u quantify to perf / mem cost of these options? (don't go crazy just some simple benchmarks) |
|
||
def _assert_two_datetime_values_same(self, a, b): | ||
err_code = self._two_datetime_values_same(a, b) | ||
self.assertEquals(err_code, 0, '%s != %s with err_code: %d' % (repr(a), repr(b), err_code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you're trying to do, but it would probably be cleaner to use the assert_attr_equal function from pandas/util/testing
def _two_values_same_attributes(self, a, b, attrs):
for attr in attrs:
assert_attr_equal(attr, a, b)
That way you'll get an error message that specifies exactly which attribute is off. If you want to collect them, you could run through the attrs first and then raise an error at the end:
non_equal = [attr for attr in attrs if getattr(a, attr) != getattr(b, attr)]
if non_equal:
raise AssertionError("Attributes not equal: %r for objects %s and %s" % (attrs, a, b))
The error code part is just hard to debug.
Should there be a test case for doing arithmetic with a pytz timezone-aware series and a dateutil timezone-aware series/single object? Don't need to run through every operation, but would be nice to test subtraction (which should result in timedeltas). I looked through your test cases and I didn't see something like this, but I'm not sure if it's in scope (especially because I don't use datetime things frequently). |
I have added the Series subtraction test and finished the refactor of the |
can you run a perf test and post relevant results (e.g. timeseries_*)....
|
Hi @jreback ,
What do you think? |
Can you run that last one again and see what you get? Could just be noise. |
Not noise methinks, results from 7 runs:
Average 1.235 |
@jtratner What do you think? I have had a close look at my changes and can't see any added bottlenecks. |
I have a number of suspicions/comments (and you should try running cython I'm assuming that one or more of these functions is being called many many I think if you define a few globals that cover some of the underlying I believe In addition to that, you've added a number of getattr/hasattr calls
Perhaps the best option is to test for pytz vs. dateutil once then use Not sure whether isinstance checks would be different (I doubt it) |
@jtratner Thanks so much for taking the time to do such a detailed analysis. I haven't seen any significant speed improvement with these optimisations yet. I'm on a different project next week and I'll return to this after completing that. Thanks again. |
Bummer - I'm mostly spitballing here anyway - maybe I'll see if any of them |
closing as stale @prossahl if u would like to come back to this let us know |
I've picked this up in https://github.com/orgs/ahlmss where @prossahl left off. We're still interested in pursuing the work in GH4688 but not through this PR. We've found some problems with the code in this PR so we're happy to close it and open a new request related to the original issue. The problem we found is that conversion between pytz and dateutil can't be round-tripped - and this PR implicitly converts from dateutil to pytz. I'm working on a patch which adds full support for dateutil (e.g. create TimeStamp, DatetimeIndex, DataFrame with dateutil timezones, avoid conversions where possible) but fails some tests. I'll be looking at doing some work on this over the next few weeks to try and get it working fully. This would only really be worthwhile for us if we can get it merged into pandas (otherwise we'll end up in merging-hell at every pandas release). Are you and the rest of the pandas team still interested in dateutil support? |
1 similar comment
I've picked this up in https://github.com/orgs/ahlmss where @prossahl left off. We're still interested in pursuing the work in GH4688 but not through this PR. We've found some problems with the code in this PR so we're happy to close it and open a new request related to the original issue. The problem we found is that conversion between pytz and dateutil can't be round-tripped - and this PR implicitly converts from dateutil to pytz. I'm working on a patch which adds full support for dateutil (e.g. create TimeStamp, DatetimeIndex, DataFrame with dateutil timezones, avoid conversions where possible) but fails some tests. I'll be looking at doing some work on this over the next few weeks to try and get it working fully. This would only really be worthwhile for us if we can get it merged into pandas (otherwise we'll end up in merging-hell at every pandas release). Are you and the rest of the pandas team still interested in dateutil support? |
I think this would be a worthwhile addition, but you are right. needs to be integrated in a more seemless manner. go for a PR and will close this then (just so this is still a reminder) |
This adds
dateutil
time zone support as a compliment topytz
. The tests are designed to demonstrate that the same result is obtained using either library except where the libraries themselves differ (at the onset of DST).There is a fair amount of tricky coding here in
tslib.pyx
as it is accessing private members of the time zone object, cacheing them and performing calculations on those cached values.This should close #4688.