Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime improvements: generate fold=1 and consistent treatment of imaginary times #2392

Merged
merged 5 commits into from
Apr 18, 2020

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Apr 12, 2020

Closes #2273, so a review from @pganssle would be welcome.

Particular decisions:

  • We unconditionally generate a fold attribute of either 0 or 1, regardless of other constraints. This matches the equality and comparison semantics of naive datetimes, and therefore our bounds.

  • Previously, imaginary datetimes (i.e. the hours skipped over when DST 'springs forward') were generated unless the timezone was supplied by pytz. This has been made consistent by allowing pytz timezones to generate imaginary times too.

  • A new allow_imaginary=True argument to datetimes() which when False is equivalent to datetimes(...).filter(dateutil.tz.datetime_exists). This may be inefficient if the bounds mostly span an imaginary period and has some boundary issues near datetime.min and datetime.max.

    However, I think this is still the best available as it works well for realistic use cases, and the alternative (draw in UTC and convert) can violate important properties like "the timezone could have been generated by the timezones strategy" and "the datetime is between the (naive) bounds" due to DST issues and datetime-dependent UTC offsets.

@Zac-HD Zac-HD force-pushed the datetime-fold branch 5 times, most recently from d909265 to fab3197 Compare April 13, 2020 04:34
Copy link
Contributor

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a lot of comments here, sorry if it's excessive since most of them need no action.

Overall this looks good to me and if this were merged today I wouldn't have a problem with it. I have dropped a few FYIs and design considerations here and there, the biggest of which is the question of how to set fold when allow_imaginary=False.

Thanks for doing this! Sorry I didn't get around to it myself.


# TODO: with some probability, systematically search for one of
# - an imaginary time (if allowed),
# - a time within 24hrs of a leap second (if there any are within bounds),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be tough and I'm not sure how valuable it would be because datetime does not provide any concept of leap seconds, so any bug this finds would need to involve end users' leap second-handling code, in which case they probably know roughly where to test, no?

If you do want to bias towards leap seconds, though, PEP 615 will ship (but not use) leap second information as part of the public interface of the tzdata module (it is also likely to be available as part of the time zone data on the system).

Copy link
Member Author

@Zac-HD Zac-HD Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this is a placeholder for #69, since all the recent datetimes work has been leading up to that. In brief,

  • leap second issues can appear when comparing durations across a leap second calculated between (e.g. unix) timestamps to those calculated between dates
  • integrating a bias towards leap-second smear periods into the primary datetimes() strategy can support substantially better mutation and shrinking logic than having it downstream, and of course makes it available to users who don't know they may have latent leap-second bugs
  • I will definitely be calling you in to review leap-second information related code... we'll probably ship our own literal list as a last resort, but prefer to load from the OS or tzdata package (and maybe even pytz) if available.

hypothesis-python/tests/cover/test_datetimes.py Outdated Show resolved Hide resolved
if is_pytz_timezone(timezone):
# Can't just construct; see http://pytz.sourceforge.net
# After dropping Python 3.5 support, just use `result.fold`
return timezone.localize(value, is_dst=not getattr(value, "fold", 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything needs to change here, but is_dst is not exactly the inverse of fold. fold basically says whether in ambiguous or gap times you should use the offset that applies before the gap (fold=0) or the offset that applies after the gap (fold=1). is_dst says whether or not you should choose the side that is "DST" or "STD" (not sure what they do in STD->STD or DST->DST transitions).

I have some diagrams that may or may not be comprehensible. This is an example of what offsets apply when using PEP 495's conception of fold:

fold_gap_scheme_2

And here is an example of what offsets apply when you use "fold" to mean the inverse of is_dst:

fold_gap_scheme_1

The difference is basically that it inverts what offsets are returned during imaginary times (note that the right half of the diagram is the same in both schemes). This doesn't really matter, though, particularly when you are generating the value randomly (I don't even think fold will be set on the result of this localize call). I think the only effect is that result shrinking will tend to favor the DST offset rather than the offset that applied before the transition, but that's a pretty arbitrary choice anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no.

is_dst=not val.fold works except during DST-to-STD transitions involving an imaginary time, which happens due to the negative offset in e.g. Europe/Dublin.

...I've just documented that this can happen and that we recommend a PEP-495-compliant library like dateutil, but other suggestions are welcome.

@Zac-HD
Copy link
Member Author

Zac-HD commented Apr 13, 2020

Thanks @pganssle, these are fantastic comments and much appreciated! I'll make a few small changes, but also note many of these points in comments for posterity and future maintainers 😄

@Zac-HD Zac-HD force-pushed the datetime-fold branch 7 times, most recently from 9411cff to f783097 Compare April 15, 2020 03:55
Implementing times() in terms of datetimes() as we add heuristics for date-dependent problems like imaginary times is just asking for trouble, so we'll split it out.
@Zac-HD
Copy link
Member Author

Zac-HD commented Apr 18, 2020

@pganssle, @HypothesisWorks/hypothesis-python-contributors: I think this is ready to merge, after which I'll finally be ready to tackle #69 (our second-oldest open issue!).

@Zac-HD Zac-HD merged commit a9aba13 into HypothesisWorks:master Apr 18, 2020
@Zac-HD Zac-HD deleted the datetime-fold branch April 18, 2020 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Folds and imaginary datetimes in the datetime strategy
3 participants