Skip to content

bpo-15443: Nanoseconds support for datetime objects (work in progress) #21987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

shramov
Copy link

@shramov shramov commented Aug 28, 2020

This patch adds nanoseconds support to pure python datetime.py module as suggested in some old comment.

Also it's far from being complete since there are some open questions:

  1. str and repr are not changed. Should str be backward compatible or not? Currently they are left as they were to simplify testing.
  2. What is best way to add nanoseconds argument to time/timedelta/datetime constructors?

However I think it's worth a try to revive this issue.

https://bugs.python.org/issue15443

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@shramov

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@shramov shramov marked this pull request as draft August 28, 2020 10:15
@shramov shramov force-pushed the wip-datetime branch 5 times, most recently from 77ae062 to 6f73c7c Compare August 28, 2020 20:04
@shramov shramov marked this pull request as ready for review August 29, 2020 04:12
@shramov
Copy link
Author

shramov commented Aug 31, 2020

I've added backward compatible str that works as it was with objects where nanoseconds % 1000 == 0.
Otherwise it's formatted with full 9 digit precision.

@shramov
Copy link
Author

shramov commented Sep 8, 2020

I've also created standalone package with patched module
https://github.com/shramov/datetime_ns

@pganssle
Copy link
Member

@shramov Thanks for this. I haven't had time to review this in depth yet, but just letting you know it's on my list and from what I did look at it looks generally like it's a step in the right direction.

If I'm able to participate in the core dev sprint this year, I'm hoping to use the time to work on this problem. The sprint is scheduled for October 19-23.

@abalkin
Copy link
Member

abalkin commented Sep 28, 2020

On str: I think it will be sufficient for backward compatibility if timestamps representing whole number of microseconds (1000*n nanoseconds) print as they do now.

For repr, it is important to preserve eval(repr(t)) == t invariant. I wonder if we should change repr to something like

datetime(2020, 9, 28, 11, 22, 33, microsecond=123456)

for whole microseconds and

datetime(2020, 9, 28, 11, 22, 33, nanosecond=123456789)

for fractional nanosecond values.

@shramov
Copy link
Author

shramov commented Sep 29, 2020

I've implemented str and repr this way already.

>>> from datetime_ns import datetime
>>> datetime(2020, 1, 2, nanosecond=123456)
datetime_ns.datetime.datetime(2020, 1, 2, 0, 0, 0, nanosecond=123456)
>>> datetime(2020, 1, 2, nanosecond=123456000)
datetime_ns.datetime.datetime(2020, 1, 2, 0, 0, 0, 123456)

However this pull request is not yet updated to latest datetime_ns code

@pganssle
Copy link
Member

It occurs to me that the pure Python version of this may be easier than the C implementation. The trickiest part of this is that because of the very leaky nature of the C API, there's not really a great way to get the individual components of a PyDateTime_DateTime. We expose macros, but those don't help for bindings like Rust's where you have to re-implement all the macros yourself.

In order to add nanoseconds to the PyDateTime_DateTime implementation, I think our most efficient option is to change usecond from 3 bytes to 4 bytes add a PyDateTime_DATE_GET_NANOSECONDS function, and change PyDateTime_DATE_GET_MICROSECONDS to basically PyDateTime_DATE_GET_NANOSECONDS(o) / 1000.

For anyone not using the C macros, this will be a breaking change, though I'm not sure how significant the impact will be. I am also not sure whether this will change the size of the datetime object in memory. I think that if we keep it to 1 byte, we may just be using an unused alignment byte anyway.

@abalkin
Copy link
Member

abalkin commented Sep 29, 2020

I would not worry about the C API at this point. There are plenty of issues to be resolved at the pure python level. As for developers not using macros in C code - I have little sympathy for them. Python has a long tradition of exposing internals and relying on responsible users instead of making internals inaccessible in Java/C++-style.

@pganssle
Copy link
Member

I would not worry about the C API at this point. There are plenty of issues to be resolved at the pure python level.

I was mainly bringing it up because I suspect some of the biggest design trade-offs will need to be made in the C implementation, including the possibility that there might be significant breakage downstream, or the possibility that it might significantly change the memory usage of existing Python programs (I don't think either of these are likely, of course).

The pure Python stuff is a lot more forgiving in its ability to make code backwards compatible, and in performance (since for performance everyone uses the C version). I'm fine with continuing to hammer out the issues in the pure Python version, though.

As for developers not using macros in C code - I have little sympathy for them. Python has a long tradition of exposing internals and relying on responsible users instead of making internals inaccessible in Java/C++-style.

The biggest problem is that macros are only available in C code, so for example Rust's bindings need to re-implement them based on the data structure. I don't know how many other languages bind to Python, but given that we expose a bunch of stuff in the capsule as part of the public API, but not enough to necessarily be useful, it's hard to blame people for crossing the fuzzy line between public and private in the C API.

Even I had to use non-macro struct access in the zoneinfo backport because (until 3.10) we didn't expose any way in the C API to get the tzinfo from a PyDateTime_DateTime or PyDateTime_Time. Regardless of whether we have sympathy for these people, if we make too drastic a breaking change, it could make adopting new versions of Python difficult for the ecosystem, so we have to figure out how to strike the right balance.

@shramov
Copy link
Author

shramov commented Sep 30, 2020

By the way, why unsigned char data[X] is used instead of packed structure in C API?
If it was struct it could be possible to replace usec field with nsec and compiler will generate errors in old code that access usec directly.

upd. However this would not help Rust as it don't use header files (at least directly)

Changed microseconds to nanoseconds in timedelta, time and datetime objects.
All __repr__ and __str__ methods are unchanged to simplify testing.
Pickling is (hopefully) backward compatible.
Structures provides same layout but allows compiler to check for
invalid access.

Downside is in structure all fields are in native byte order and
needs to be converted to big-endian in pickle/unpickle.
@shramov
Copy link
Author

shramov commented Oct 1, 2020

I've updated pure code to latest datetime_ns version with correct repr, str and updated tests.
Also I've pushed patch to Include/datetime.h illustrating my idea about packed structures just as proof-of-concept.

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I realize there's a bit of a problem with how the interface looks to an end user.

If I want to call datetime passing tzinfo by position and include nanoseconds, I'd need to do something like this:

dt = datetime(2020, 12, 31, 12, 59, 59, 0, UTC, nanosecond=999_999_999)

That's just weird, and it's unclear to me (without having looked at the constructor code), what happens if microseconds is a value other than 0.

I could imagine a lot of people thinking that the allowed range for nanoseconds is [0, 1000], and constructing their datetimes like this:

dt = datetime(2020, 12, 31, 59, 59, 999_999, UTC, nanosecond=999)

I think the best ways to handle this (and neither is amazingly good) would be to either have nanosecond in the range of [0, 999] in all the public-facing semantics (internally we'd still want to use nanosecond as the canonical third struct element) or to change datetime to accept an explicit None for microsecond and ensure that exactly one of nanosecond and microsecond are specified.

@@ -1502,32 +1554,36 @@ def dst(self):
return offset

def replace(self, hour=None, minute=None, second=None, microsecond=None,
tzinfo=True, *, fold=None):
tzinfo=True, *, nanosecond=None, fold=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. It's really annoying that we have to do it this way for backwards compatibility reasons.

I also wonder if people will think that this should work like:

>>> dt = dt.replace(microsecond=1, nanosecond=5)
>>> dt.nanosecond
1005

I guess there's nothing to be done about it, though. ☹

"""Return a new time with new values for the specified fields."""
if hour is None:
hour = self.hour
if minute is None:
minute = self.minute
if second is None:
second = self.second
if microsecond is None:
microsecond = self.microsecond
if nanosecond is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to address the case where the user specifies both microsecond and nanosecond here.

if microsecond is not None and nanosecond is not None:
    raise ValueError("Only one of microsecond and nanosecond may be specified.")

I guess you did that for datetime but not for time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I don't like gratuitous introduction of floating point numbers, I think in this case it is best to allow fractional number of microseconds in the datetime constructor. (Accepting numbers.Rational). We can still add nanoseconds keyword to timedelta where all keywords already support passing floats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did discuss this at the CPython core dev sprint, in the #stdlib room. A relevant excerpt from the conversation (@gpshead and @gvanrossum, please let me know if you'd like me to remove the transcript and replace it with a summary):

[2020-10-23 15:40:18] pganssle: Yeah. The API is always going to be a little awkard, particularly because of the positional arguments (which we can't change), but I think "can't specify both at once" is probably the safest.
[2020-10-23 15:41:36] pganssle: It might also make sense to accept a float for microsecond as an alternate way to specify nanoseconds, but that could always be added later if we decide it's desirable.
[2020-10-23 21:51:19] gpshead (Greg): accepting a float for microsecond on input is an option. a double has enough digits to be precise to nanoseconds within a 0.0 - 999_999_999.999 range. i'd just consider it a convenience though rather than a good primary api.
[2020-10-23 21:54:41] gvanrossum: Hm. To me, "float" time implies "seconds". Would hate to lose that for such a minor convenience.
[2020-10-23 21:55:49] gvanrossum: (It's not an important principle, but a handy debugging heuristic -- "hey, these times are floats; they probably meant seconds")
[2020-10-23 21:57:25] gpshead (Greg): yeah, it'd be awkward to specify a fractional component of something as a range restricted float.
[2020-10-23 22:02:05] pganssle: I just meant that you could pass microsecond=100.145 to be the same as microsecond=None, nanosecond=100_145.
[2020-10-23 22:02:29] pganssle: The return value of .microsecond would still be 100 and .nanosecond would be 100145.
[2020-10-23 22:03:52] pganssle: The biggest problem I have with that as a convenience API would be that it's mostly unnecessary and also it's got a non-obvious precision truncation going on. datetime(2020, 1, 1, microsecond=100.145) would be the same as datetime(2020, 1, 1, microsecond=100.14577893940), even if those are two distinct floats.
[2020-10-23 22:18:17] gvanrossum: Still -1.0 on that.
[2020-10-23 22:21:04] gpshead (Greg): yeah agreed now that i see it. (-1)

It seems that the rough consensus on this is that there's nowhere to go from here that isn't awkward or problematic in some way, but people prefer microsecond and nanosecond to be mutually exclusive ints.

Copy link
Member

@abalkin abalkin Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GvR is right (no surprise!) - we should allow float (rational) seconds instead of microseconds and disallow microseconds when second is passed as noninteger. time(12, 0, 59.123456789) looks much better than time(12, 0, 59, 123456.789).

I would even argue that positional microsecond argument can be deprecated or at least discouraged in the future in favor of rational seconds).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more annoying than it's worth to deprecate microsecond here (and in the C API!), because there's probably a lot of code out there specifying microseconds that would need to be changed for something so trivial. Otherwise I agree with the path forward: accept floats for second.

The remaining questions:

  1. Do we still add a nanosecond kwarg? Clearly it would be similarly discouraged but people don't usually pay attention to us, but they will notice that there's a microsecond but no nanosecond, and they won't understand why. I definitely foresee adding nanosecond support without a nanosecond kwarg to be greeted with complaints and confusion.
  2. Do we still allow floats/rationals on microsecond, or are we going to say "fractional seconds or specify microsecond or specify nanosecond, but only one of the three"? I'm OK with either but leaning towards the second.
  3. If we do this, will it / should it affect the exostomg C API? I'm thinking no, we should simply add an additional / additional constructors (API to be specified later, but probably along the same lines — one that takes a double for second and/or one that takes a Nanosecond positional argument).

@shramov
Copy link
Author

shramov commented Oct 22, 2020

I think the best ways to handle this (and neither is amazingly good) would be to either have nanosecond in the range of [0, 999] in all the public-facing semantics (internally we'd still want to use nanosecond as the canonical third struct element) or to change datetime to accept an explicit None for microsecond and ensure that exactly one of nanosecond and microsecond are specified.

Another way is to replace 'microsecond' positional parameter with 'subsecond' one and have optional resolution, defaulting to 10e6 (or 'us').
'microsecond', 'nanosecond' and other keyword parameters may be mutual exclusive aliases. Something like is used for pickling time in this patch.
Or use something like numbers.Rational derivative (which is just same).

@pganssle
Copy link
Member

pganssle commented Apr 6, 2021

@shramov @abalkin Do either of you have thoughts about what is still blocking / needs to be done on this? Beta freeze for Python 3.10 is coming up at the beginning of May and I think we may have enough time to get this in before then. Probably would have been better to get it into an alpha release, but if we miss beta freeze it'll get pushed to 3.11, and I do think that nanosecond support is a desirable feature for a lot of people.

It might be good for us to get an explicit "to-do" list of concerns to be addressed before this can be merged.

@robsdedude
Copy link

Seems like this PR has gone stale. I've been following it for a good while now. I'm very interested in seeing this feature in a future Python version and I'm happy to help out on this.

From reading the whole thread it looks like the next step is to decide what the python only API for this should look like. The most promising candidate (to me anyway) seems to be

still add a nanosecond kwarg

to avoid confused users, even if this results in a awkward looking parameter order and

fractional seconds or specify microsecond or specify nanosecond, but only one of the three

plus encourage the usage of fractional seconds over the other two.

Again, I'm happy to help with the implementation, I'm just not sure how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants