-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datetime improvements: generate fold=1
and consistent treatment of imaginary times
#2392
Conversation
d909265
to
fab3197
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a lot of comments here, sorry if it's excessive since most of them need no action.
Overall this looks good to me and if this were merged today I wouldn't have a problem with it. I have dropped a few FYIs and design considerations here and there, the biggest of which is the question of how to set fold
when allow_imaginary=False
.
Thanks for doing this! Sorry I didn't get around to it myself.
|
||
# TODO: with some probability, systematically search for one of | ||
# - an imaginary time (if allowed), | ||
# - a time within 24hrs of a leap second (if there any are within bounds), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be tough and I'm not sure how valuable it would be because datetime
does not provide any concept of leap seconds, so any bug this finds would need to involve end users' leap second-handling code, in which case they probably know roughly where to test, no?
If you do want to bias towards leap seconds, though, PEP 615 will ship (but not use) leap second information as part of the public interface of the tzdata
module (it is also likely to be available as part of the time zone data on the system).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously this is a placeholder for #69, since all the recent datetimes work has been leading up to that. In brief,
- leap second issues can appear when comparing durations across a leap second calculated between (e.g. unix) timestamps to those calculated between dates
- integrating a bias towards leap-second smear periods into the primary
datetimes()
strategy can support substantially better mutation and shrinking logic than having it downstream, and of course makes it available to users who don't know they may have latent leap-second bugs - I will definitely be calling you in to review leap-second information related code... we'll probably ship our own literal list as a last resort, but prefer to load from the OS or
tzdata
package (and maybe evenpytz
) if available.
if is_pytz_timezone(timezone): | ||
# Can't just construct; see http://pytz.sourceforge.net | ||
# After dropping Python 3.5 support, just use `result.fold` | ||
return timezone.localize(value, is_dst=not getattr(value, "fold", 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anything needs to change here, but is_dst
is not exactly the inverse of fold
. fold
basically says whether in ambiguous or gap times you should use the offset that applies before the gap (fold=0
) or the offset that applies after the gap (fold=1
). is_dst
says whether or not you should choose the side that is "DST" or "STD" (not sure what they do in STD->STD or DST->DST transitions).
I have some diagrams that may or may not be comprehensible. This is an example of what offsets apply when using PEP 495's conception of fold
:
And here is an example of what offsets apply when you use "fold" to mean the inverse of is_dst
:
The difference is basically that it inverts what offsets are returned during imaginary times (note that the right half of the diagram is the same in both schemes). This doesn't really matter, though, particularly when you are generating the value randomly (I don't even think fold
will be set on the result of this localize
call). I think the only effect is that result shrinking will tend to favor the DST offset rather than the offset that applied before the transition, but that's a pretty arbitrary choice anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no.
is_dst=not val.fold
works except during DST-to-STD transitions involving an imaginary time, which happens due to the negative offset in e.g. Europe/Dublin
.
...I've just documented that this can happen and that we recommend a PEP-495-compliant library like dateutil
, but other suggestions are welcome.
Thanks @pganssle, these are fantastic comments and much appreciated! I'll make a few small changes, but also note many of these points in comments for posterity and future maintainers 😄 |
9411cff
to
f783097
Compare
Implementing times() in terms of datetimes() as we add heuristics for date-dependent problems like imaginary times is just asking for trouble, so we'll split it out.
Closes #2273, so a review from @pganssle would be welcome.
Particular decisions:
We unconditionally generate a
fold
attribute of either0
or1
, regardless of other constraints. This matches the equality and comparison semantics of naive datetimes, and therefore our bounds.Previously, imaginary datetimes (i.e. the hours skipped over when DST 'springs forward') were generated unless the timezone was supplied by
pytz
. This has been made consistent by allowingpytz
timezones to generate imaginary times too.A new
allow_imaginary=True
argument todatetimes()
which whenFalse
is equivalent todatetimes(...).filter(dateutil.tz.datetime_exists)
. This may be inefficient if the bounds mostly span an imaginary period and has some boundary issues neardatetime.min
anddatetime.max
.However, I think this is still the best available as it works well for realistic use cases, and the alternative (draw in UTC and convert) can violate important properties like "the timezone could have been generated by the
timezones
strategy" and "the datetime is between the (naive) bounds" due to DST issues and datetime-dependent UTC offsets.