Datetime improvements: generate `fold=1` and consistent treatment of imaginary times #2392

Zac-HD · 2020-04-12T07:09:25Z

Closes #2273, so a review from @pganssle would be welcome.

Particular decisions:

We unconditionally generate a fold attribute of either 0 or 1, regardless of other constraints. This matches the equality and comparison semantics of naive datetimes, and therefore our bounds.
Previously, imaginary datetimes (i.e. the hours skipped over when DST 'springs forward') were generated unless the timezone was supplied by pytz. This has been made consistent by allowing pytz timezones to generate imaginary times too.
A new allow_imaginary=True argument to datetimes() which when False is equivalent to datetimes(...).filter(dateutil.tz.datetime_exists). This may be inefficient if the bounds mostly span an imaginary period and has some boundary issues near datetime.min and datetime.max.

However, I think this is still the best available as it works well for realistic use cases, and the alternative (draw in UTC and convert) can violate important properties like "the timezone could have been generated by the timezones strategy" and "the datetime is between the (naive) bounds" due to DST issues and datetime-dependent UTC offsets.

pganssle

I've made a lot of comments here, sorry if it's excessive since most of them need no action.

Overall this looks good to me and if this were merged today I wouldn't have a problem with it. I have dropped a few FYIs and design considerations here and there, the biggest of which is the question of how to set fold when allow_imaginary=False.

Thanks for doing this! Sorry I didn't get around to it myself.

hypothesis-python/src/hypothesis/strategies/_internal/datetime.py

pganssle · 2020-04-13T12:52:29Z

hypothesis-python/src/hypothesis/strategies/_internal/datetime.py

+
+ # TODO: with some probability, systematically search for one of
+ # - an imaginary time (if allowed),
+ # - a time within 24hrs of a leap second (if there any are within bounds),


This may be tough and I'm not sure how valuable it would be because datetime does not provide any concept of leap seconds, so any bug this finds would need to involve end users' leap second-handling code, in which case they probably know roughly where to test, no?

If you do want to bias towards leap seconds, though, PEP 615 will ship (but not use) leap second information as part of the public interface of the tzdata module (it is also likely to be available as part of the time zone data on the system).

Obviously this is a placeholder for #69, since all the recent datetimes work has been leading up to that. In brief,

leap second issues can appear when comparing durations across a leap second calculated between (e.g. unix) timestamps to those calculated between dates

integrating a bias towards leap-second smear periods into the primary datetimes() strategy can support substantially better mutation and shrinking logic than having it downstream, and of course makes it available to users who don't know they may have latent leap-second bugs

I will definitely be calling you in to review leap-second information related code... we'll probably ship our own literal list as a last resort, but prefer to load from the OS or tzdata package (and maybe even pytz) if available.

hypothesis-python/tests/cover/test_datetimes.py

hypothesis-python/src/hypothesis/strategies/_internal/datetime.py

pganssle · 2020-04-13T13:54:44Z

hypothesis-python/src/hypothesis/strategies/_internal/datetime.py

+ if is_pytz_timezone(timezone):
+ # Can't just construct; see http://pytz.sourceforge.net
+ # After dropping Python 3.5 support, just use `result.fold`
+ return timezone.localize(value, is_dst=not getattr(value, "fold", 0))


I don't think anything needs to change here, but is_dst is not exactly the inverse of fold. fold basically says whether in ambiguous or gap times you should use the offset that applies before the gap (fold=0) or the offset that applies after the gap (fold=1). is_dst says whether or not you should choose the side that is "DST" or "STD" (not sure what they do in STD->STD or DST->DST transitions).

I have some diagrams that may or may not be comprehensible. This is an example of what offsets apply when using PEP 495's conception of fold:

And here is an example of what offsets apply when you use "fold" to mean the inverse of is_dst:

The difference is basically that it inverts what offsets are returned during imaginary times (note that the right half of the diagram is the same in both schemes). This doesn't really matter, though, particularly when you are generating the value randomly (I don't even think fold will be set on the result of this localize call). I think the only effect is that result shrinking will tend to favor the DST offset rather than the offset that applied before the transition, but that's a pretty arbitrary choice anyway.

Oh no.

is_dst=not val.fold works except during DST-to-STD transitions involving an imaginary time, which happens due to the negative offset in e.g. Europe/Dublin.

...I've just documented that this can happen and that we recommend a PEP-495-compliant library like dateutil, but other suggestions are welcome.

Zac-HD · 2020-04-13T14:15:14Z

Thanks @pganssle, these are fantastic comments and much appreciated! I'll make a few small changes, but also note many of these points in comments for posterity and future maintainers 😄

Implementing times() in terms of datetimes() as we add heuristics for date-dependent problems like imaginary times is just asking for trouble, so we'll split it out.

Zac-HD · 2020-04-18T01:49:04Z

@pganssle, @HypothesisWorks/hypothesis-python-contributors: I think this is ready to merge, after which I'll finally be ready to tackle #69 (our second-oldest open issue!).

hypothesis-python/RELEASE.rst

Zac-HD force-pushed the datetime-fold branch 5 times, most recently from d909265 to fab3197 Compare April 13, 2020 04:34

pganssle approved these changes Apr 13, 2020

View reviewed changes

Zac-HD force-pushed the datetime-fold branch 7 times, most recently from 9411cff to f783097 Compare April 15, 2020 03:55

Zac-HD added 4 commits April 18, 2020 11:13

Independent time strategy

bb382c7

Implementing times() in terms of datetimes() as we add heuristics for date-dependent problems like imaginary times is just asking for trouble, so we'll split it out.

Better links in extra.dateutil docs

ce31cfe

Generate datetime.fold attribute

60eeeff

New datetimes() arg: allow_imaginary

12b701d

Zac-HD force-pushed the datetime-fold branch from f783097 to 12b701d Compare April 18, 2020 01:13

moreati reviewed Apr 18, 2020

View reviewed changes

hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved

Define imaginary time in release notes

66edd93

moreati approved these changes Apr 18, 2020

View reviewed changes

Zac-HD merged commit a9aba13 into HypothesisWorks:master Apr 18, 2020

Zac-HD deleted the datetime-fold branch April 18, 2020 08:32

Zac-HD mentioned this pull request Apr 27, 2020

Improve structure of timezones() strategies #2414

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datetime improvements: generate `fold=1` and consistent treatment of imaginary times #2392

Datetime improvements: generate `fold=1` and consistent treatment of imaginary times #2392

Zac-HD commented Apr 12, 2020

pganssle left a comment

pganssle Apr 13, 2020

Zac-HD Apr 13, 2020 •

edited

Loading

pganssle Apr 13, 2020

Zac-HD Apr 14, 2020

Zac-HD commented Apr 13, 2020

Zac-HD commented Apr 18, 2020

Datetime improvements: generate fold=1 and consistent treatment of imaginary times #2392

Datetime improvements: generate fold=1 and consistent treatment of imaginary times #2392

Conversation

Zac-HD commented Apr 12, 2020

pganssle left a comment

Choose a reason for hiding this comment

pganssle Apr 13, 2020

Choose a reason for hiding this comment

Zac-HD Apr 13, 2020 • edited Loading

Choose a reason for hiding this comment

pganssle Apr 13, 2020

Choose a reason for hiding this comment

Zac-HD Apr 14, 2020

Choose a reason for hiding this comment

Zac-HD commented Apr 13, 2020

Zac-HD commented Apr 18, 2020

Datetime improvements: generate `fold=1` and consistent treatment of imaginary times #2392

Datetime improvements: generate `fold=1` and consistent treatment of imaginary times #2392

Zac-HD Apr 13, 2020 •

edited

Loading