bpo-15443: Nanoseconds support for datetime objects (work in progress) #21987

shramov · 2020-08-28T10:09:50Z

This patch adds nanoseconds support to pure python datetime.py module as suggested in some old comment.

Also it's far from being complete since there are some open questions:

str and repr are not changed. Should str be backward compatible or not? Currently they are left as they were to simplify testing.
What is best way to add nanoseconds argument to time/timedelta/datetime constructors?

However I think it's worth a try to revive this issue.

https://bugs.python.org/issue15443

the-knights-who-say-ni · 2020-08-28T10:09:54Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@shramov

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

shramov · 2020-08-31T20:18:28Z

I've added backward compatible str that works as it was with objects where nanoseconds % 1000 == 0.
Otherwise it's formatted with full 9 digit precision.

shramov · 2020-09-08T06:35:21Z

I've also created standalone package with patched module
https://github.com/shramov/datetime_ns

pganssle · 2020-09-28T13:18:49Z

@shramov Thanks for this. I haven't had time to review this in depth yet, but just letting you know it's on my list and from what I did look at it looks generally like it's a step in the right direction.

If I'm able to participate in the core dev sprint this year, I'm hoping to use the time to work on this problem. The sprint is scheduled for October 19-23.

abalkin · 2020-09-28T15:47:32Z

On str: I think it will be sufficient for backward compatibility if timestamps representing whole number of microseconds (1000*n nanoseconds) print as they do now.

For repr, it is important to preserve eval(repr(t)) == t invariant. I wonder if we should change repr to something like

datetime(2020, 9, 28, 11, 22, 33, microsecond=123456)

for whole microseconds and

datetime(2020, 9, 28, 11, 22, 33, nanosecond=123456789)

for fractional nanosecond values.

shramov · 2020-09-29T15:08:23Z

I've implemented str and repr this way already.

>>> from datetime_ns import datetime
>>> datetime(2020, 1, 2, nanosecond=123456)
datetime_ns.datetime.datetime(2020, 1, 2, 0, 0, 0, nanosecond=123456)
>>> datetime(2020, 1, 2, nanosecond=123456000)
datetime_ns.datetime.datetime(2020, 1, 2, 0, 0, 0, 123456)

However this pull request is not yet updated to latest datetime_ns code

pganssle · 2020-09-29T16:23:49Z

It occurs to me that the pure Python version of this may be easier than the C implementation. The trickiest part of this is that because of the very leaky nature of the C API, there's not really a great way to get the individual components of a PyDateTime_DateTime. We expose macros, but those don't help for bindings like Rust's where you have to re-implement all the macros yourself.

In order to add nanoseconds to the PyDateTime_DateTime implementation, I think our most efficient option is to change usecond from 3 bytes to 4 bytes add a PyDateTime_DATE_GET_NANOSECONDS function, and change PyDateTime_DATE_GET_MICROSECONDS to basically PyDateTime_DATE_GET_NANOSECONDS(o) / 1000.

For anyone not using the C macros, this will be a breaking change, though I'm not sure how significant the impact will be. I am also not sure whether this will change the size of the datetime object in memory. I think that if we keep it to 1 byte, we may just be using an unused alignment byte anyway.

abalkin · 2020-09-29T17:31:59Z

I would not worry about the C API at this point. There are plenty of issues to be resolved at the pure python level. As for developers not using macros in C code - I have little sympathy for them. Python has a long tradition of exposing internals and relying on responsible users instead of making internals inaccessible in Java/C++-style.

pganssle · 2020-09-29T18:00:33Z

I would not worry about the C API at this point. There are plenty of issues to be resolved at the pure python level.

I was mainly bringing it up because I suspect some of the biggest design trade-offs will need to be made in the C implementation, including the possibility that there might be significant breakage downstream, or the possibility that it might significantly change the memory usage of existing Python programs (I don't think either of these are likely, of course).

The pure Python stuff is a lot more forgiving in its ability to make code backwards compatible, and in performance (since for performance everyone uses the C version). I'm fine with continuing to hammer out the issues in the pure Python version, though.

As for developers not using macros in C code - I have little sympathy for them. Python has a long tradition of exposing internals and relying on responsible users instead of making internals inaccessible in Java/C++-style.

The biggest problem is that macros are only available in C code, so for example Rust's bindings need to re-implement them based on the data structure. I don't know how many other languages bind to Python, but given that we expose a bunch of stuff in the capsule as part of the public API, but not enough to necessarily be useful, it's hard to blame people for crossing the fuzzy line between public and private in the C API.

Even I had to use non-macro struct access in the zoneinfo backport because (until 3.10) we didn't expose any way in the C API to get the tzinfo from a PyDateTime_DateTime or PyDateTime_Time. Regardless of whether we have sympathy for these people, if we make too drastic a breaking change, it could make adopting new versions of Python difficult for the ecosystem, so we have to figure out how to strike the right balance.

shramov · 2020-09-30T10:49:37Z

By the way, why unsigned char data[X] is used instead of packed structure in C API?
If it was struct it could be possible to replace usec field with nsec and compiler will generate errors in old code that access usec directly.

upd. However this would not help Rust as it don't use header files (at least directly)

Changed microseconds to nanoseconds in timedelta, time and datetime objects. All __repr__ and __str__ methods are unchanged to simplify testing. Pickling is (hopefully) backward compatible.

Structures provides same layout but allows compiler to check for invalid access. Downside is in structure all fields are in native byte order and needs to be converted to big-endian in pickle/unpickle.

shramov · 2020-10-01T05:25:19Z

I've updated pure code to latest datetime_ns version with correct repr, str and updated tests.
Also I've pushed patch to Include/datetime.h illustrating my idea about packed structures just as proof-of-concept.

pganssle

Looking at this, I realize there's a bit of a problem with how the interface looks to an end user.

If I want to call datetime passing tzinfo by position and include nanoseconds, I'd need to do something like this:

dt = datetime(2020, 12, 31, 12, 59, 59, 0, UTC, nanosecond=999_999_999)

That's just weird, and it's unclear to me (without having looked at the constructor code), what happens if microseconds is a value other than 0.

I could imagine a lot of people thinking that the allowed range for nanoseconds is [0, 1000], and constructing their datetimes like this:

dt = datetime(2020, 12, 31, 59, 59, 999_999, UTC, nanosecond=999)

I think the best ways to handle this (and neither is amazingly good) would be to either have nanosecond in the range of [0, 999] in all the public-facing semantics (internally we'd still want to use nanosecond as the canonical third struct element) or to change datetime to accept an explicit None for microsecond and ensure that exactly one of nanosecond and microsecond are specified.

pganssle · 2020-10-21T20:39:36Z

Lib/datetime.py

@@ -1502,32 +1554,36 @@ def dst(self):
        return offset

    def replace(self, hour=None, minute=None, second=None, microsecond=None,
-                tzinfo=True, *, fold=None):
+                tzinfo=True, *, nanosecond=None, fold=None):


Hm. It's really annoying that we have to do it this way for backwards compatibility reasons.

I also wonder if people will think that this should work like:

>>> dt = dt.replace(microsecond=1, nanosecond=5) >>> dt.nanosecond 1005

I guess there's nothing to be done about it, though. ☹

pganssle · 2020-10-21T20:41:19Z

Lib/datetime.py

        """Return a new time with new values for the specified fields."""
        if hour is None:
            hour = self.hour
        if minute is None:
            minute = self.minute
        if second is None:
            second = self.second
-        if microsecond is None:
-            microsecond = self.microsecond
+        if nanosecond is None:


I believe we need to address the case where the user specifies both microsecond and nanosecond here.

if microsecond is not None and nanosecond is not None: raise ValueError("Only one of microsecond and nanosecond may be specified.")

I guess you did that for datetime but not for time?

As much as I don't like gratuitous introduction of floating point numbers, I think in this case it is best to allow fractional number of microseconds in the datetime constructor. (Accepting numbers.Rational). We can still add nanoseconds keyword to timedelta where all keywords already support passing floats.

We did discuss this at the CPython core dev sprint, in the #stdlib room. A relevant excerpt from the conversation (@gpshead and @gvanrossum, please let me know if you'd like me to remove the transcript and replace it with a summary):

[2020-10-23 15:40:18] pganssle: Yeah. The API is always going to be a little awkard, particularly because of the positional arguments (which we can't change), but I think "can't specify both at once" is probably the safest.
[2020-10-23 15:41:36] pganssle: It might also make sense to accept a float for microsecond as an alternate way to specify nanoseconds, but that could always be added later if we decide it's desirable.
[2020-10-23 21:51:19] gpshead (Greg): accepting a float for microsecond on input is an option. a double has enough digits to be precise to nanoseconds within a 0.0 - 999_999_999.999 range. i'd just consider it a convenience though rather than a good primary api.
[2020-10-23 21:54:41] gvanrossum: Hm. To me, "float" time implies "seconds". Would hate to lose that for such a minor convenience.
[2020-10-23 21:55:49] gvanrossum: (It's not an important principle, but a handy debugging heuristic -- "hey, these times are floats; they probably meant seconds")
[2020-10-23 21:57:25] gpshead (Greg): yeah, it'd be awkward to specify a fractional component of something as a range restricted float.
[2020-10-23 22:02:05] pganssle: I just meant that you could pass microsecond=100.145 to be the same as microsecond=None, nanosecond=100_145.
[2020-10-23 22:02:29] pganssle: The return value of .microsecond would still be 100 and .nanosecond would be 100145.
[2020-10-23 22:03:52] pganssle: The biggest problem I have with that as a convenience API would be that it's mostly unnecessary and also it's got a non-obvious precision truncation going on. datetime(2020, 1, 1, microsecond=100.145) would be the same as datetime(2020, 1, 1, microsecond=100.14577893940), even if those are two distinct floats.
[2020-10-23 22:18:17] gvanrossum: Still -1.0 on that.
[2020-10-23 22:21:04] gpshead (Greg): yeah agreed now that i see it. (-1)

It seems that the rough consensus on this is that there's nowhere to go from here that isn't awkward or problematic in some way, but people prefer microsecond and nanosecond to be mutually exclusive ints.

GvR is right (no surprise!) - we should allow float (rational) seconds instead of microseconds and disallow microseconds when second is passed as noninteger. time(12, 0, 59.123456789) looks much better than time(12, 0, 59, 123456.789).

I would even argue that positional microsecond argument can be deprecated or at least discouraged in the future in favor of rational seconds).

I think it might be more annoying than it's worth to deprecate microsecond here (and in the C API!), because there's probably a lot of code out there specifying microseconds that would need to be changed for something so trivial. Otherwise I agree with the path forward: accept floats for second.

The remaining questions:

Do we still add a nanosecond kwarg? Clearly it would be similarly discouraged but people don't usually pay attention to us, but they will notice that there's a microsecond but no nanosecond, and they won't understand why. I definitely foresee adding nanosecond support without a nanosecond kwarg to be greeted with complaints and confusion.

Do we still allow floats/rationals on microsecond, or are we going to say "fractional seconds or specify microsecond or specify nanosecond, but only one of the three"? I'm OK with either but leaning towards the second.

If we do this, will it / should it affect the exostomg C API? I'm thinking no, we should simply add an additional / additional constructors (API to be specified later, but probably along the same lines — one that takes a double for second and/or one that takes a Nanosecond positional argument).

shramov · 2020-10-22T12:45:25Z

I think the best ways to handle this (and neither is amazingly good) would be to either have nanosecond in the range of [0, 999] in all the public-facing semantics (internally we'd still want to use nanosecond as the canonical third struct element) or to change datetime to accept an explicit None for microsecond and ensure that exactly one of nanosecond and microsecond are specified.

Another way is to replace 'microsecond' positional parameter with 'subsecond' one and have optional resolution, defaulting to 10e6 (or 'us').
'microsecond', 'nanosecond' and other keyword parameters may be mutual exclusive aliases. Something like is used for pickling time in this patch.
Or use something like numbers.Rational derivative (which is just same).

pganssle · 2021-04-06T14:31:21Z

@shramov @abalkin Do either of you have thoughts about what is still blocking / needs to be done on this? Beta freeze for Python 3.10 is coming up at the beginning of May and I think we may have enough time to get this in before then. Probably would have been better to get it into an alpha release, but if we miss beta freeze it'll get pushed to 3.11, and I do think that nanosecond support is a desirable feature for a lot of people.

It might be good for us to get an explicit "to-do" list of concerns to be addressed before this can be merged.

robsdedude · 2022-11-02T15:10:32Z

Seems like this PR has gone stale. I've been following it for a good while now. I'm very interested in seeing this feature in a future Python version and I'm happy to help out on this.

From reading the whole thread it looks like the next step is to decide what the python only API for this should look like. The most promising candidate (to me anyway) seems to be

still add a nanosecond kwarg

to avoid confused users, even if this results in a awkward looking parameter order and

fractional seconds or specify microsecond or specify nanosecond, but only one of the three

plus encourage the usage of fractional seconds over the other two.

Again, I'm happy to help with the implementation, I'm just not sure how to get started.

shramov requested review from abalkin and pganssle as code owners August 28, 2020 10:09

the-knights-who-say-ni added the CLA not signed label Aug 28, 2020

bedevere-bot added the awaiting review label Aug 28, 2020

shramov marked this pull request as draft August 28, 2020 10:15

shramov force-pushed the wip-datetime branch 5 times, most recently from 77ae062 to 6f73c7c Compare August 28, 2020 20:04

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Aug 28, 2020

shramov marked this pull request as ready for review August 29, 2020 04:12

shramov added 2 commits October 1, 2020 08:22

wip: bpo-15443 Nanoseconds support for datetime objects

7f9dbe5

Changed microseconds to nanoseconds in timedelta, time and datetime objects. All __repr__ and __str__ methods are unchanged to simplify testing. Pickling is (hopefully) backward compatible.

wip: datetime: Replace char arrays with packed structures

dbbaf7a

Structures provides same layout but allows compiler to check for invalid access. Downside is in structure all fields are in native byte order and needs to be converted to big-endian in pickle/unpickle.

shramov force-pushed the wip-datetime branch from 82760f7 to dbbaf7a Compare October 1, 2020 05:22

pganssle reviewed Oct 22, 2020

View reviewed changes

goshawk mannequin mentioned this pull request Apr 10, 2022

datetime module has no support for nanoseconds #59648

Open

ezio-melotti removed the CLA signed label Jul 13, 2022

FichteFoll mentioned this pull request Jan 29, 2025

gh-59648: Nanosecond support for datetime #92078

Open

Uh oh!

bpo-15443: Nanoseconds support for datetime objects (work in progress) #21987

Are you sure you want to change the base?

bpo-15443: Nanoseconds support for datetime objects (work in progress) #21987

Uh oh!

Conversation

shramov commented Aug 28, 2020 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Aug 28, 2020

Recognized GitHub username

Uh oh!

shramov commented Aug 31, 2020

Uh oh!

shramov commented Sep 8, 2020

Uh oh!

pganssle commented Sep 28, 2020

Uh oh!

abalkin commented Sep 28, 2020

Uh oh!

shramov commented Sep 29, 2020

Uh oh!

pganssle commented Sep 29, 2020

Uh oh!

abalkin commented Sep 29, 2020

Uh oh!

pganssle commented Sep 29, 2020

Uh oh!

shramov commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shramov commented Oct 1, 2020

Uh oh!

pganssle left a comment

Choose a reason for hiding this comment

Uh oh!

pganssle Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

pganssle Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

abalkin Apr 6, 2021

Choose a reason for hiding this comment

Uh oh!

pganssle Apr 6, 2021

Choose a reason for hiding this comment

Uh oh!

abalkin Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pganssle Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

shramov commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pganssle commented Apr 6, 2021

Uh oh!

robsdedude commented Nov 2, 2022

Uh oh!

Uh oh!

shramov commented Aug 28, 2020 •

edited by bedevere-bot

Loading

shramov commented Sep 30, 2020 •

edited

Loading

abalkin Apr 7, 2021 •

edited

Loading

shramov commented Oct 22, 2020 •

edited

Loading