-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-15443: Nanoseconds support for datetime objects (work in progress) #21987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
77ae062
to
6f73c7c
Compare
I've added backward compatible str that works as it was with objects where nanoseconds % 1000 == 0. |
I've also created standalone package with patched module |
@shramov Thanks for this. I haven't had time to review this in depth yet, but just letting you know it's on my list and from what I did look at it looks generally like it's a step in the right direction. If I'm able to participate in the core dev sprint this year, I'm hoping to use the time to work on this problem. The sprint is scheduled for October 19-23. |
On str: I think it will be sufficient for backward compatibility if timestamps representing whole number of microseconds (1000*n nanoseconds) print as they do now. For repr, it is important to preserve datetime(2020, 9, 28, 11, 22, 33, microsecond=123456) for whole microseconds and datetime(2020, 9, 28, 11, 22, 33, nanosecond=123456789) for fractional nanosecond values. |
I've implemented str and repr this way already. >>> from datetime_ns import datetime
>>> datetime(2020, 1, 2, nanosecond=123456)
datetime_ns.datetime.datetime(2020, 1, 2, 0, 0, 0, nanosecond=123456)
>>> datetime(2020, 1, 2, nanosecond=123456000)
datetime_ns.datetime.datetime(2020, 1, 2, 0, 0, 0, 123456) However this pull request is not yet updated to latest datetime_ns code |
It occurs to me that the pure Python version of this may be easier than the C implementation. The trickiest part of this is that because of the very leaky nature of the C API, there's not really a great way to get the individual components of a In order to add nanoseconds to the For anyone not using the C macros, this will be a breaking change, though I'm not sure how significant the impact will be. I am also not sure whether this will change the size of the |
I would not worry about the C API at this point. There are plenty of issues to be resolved at the pure python level. As for developers not using macros in C code - I have little sympathy for them. Python has a long tradition of exposing internals and relying on responsible users instead of making internals inaccessible in Java/C++-style. |
I was mainly bringing it up because I suspect some of the biggest design trade-offs will need to be made in the C implementation, including the possibility that there might be significant breakage downstream, or the possibility that it might significantly change the memory usage of existing Python programs (I don't think either of these are likely, of course). The pure Python stuff is a lot more forgiving in its ability to make code backwards compatible, and in performance (since for performance everyone uses the C version). I'm fine with continuing to hammer out the issues in the pure Python version, though.
The biggest problem is that macros are only available in C code, so for example Rust's bindings need to re-implement them based on the data structure. I don't know how many other languages bind to Python, but given that we expose a bunch of stuff in the capsule as part of the public API, but not enough to necessarily be useful, it's hard to blame people for crossing the fuzzy line between public and private in the C API. Even I had to use non-macro struct access in the |
By the way, why unsigned char data[X] is used instead of packed structure in C API? upd. However this would not help Rust as it don't use header files (at least directly) |
Changed microseconds to nanoseconds in timedelta, time and datetime objects. All __repr__ and __str__ methods are unchanged to simplify testing. Pickling is (hopefully) backward compatible.
Structures provides same layout but allows compiler to check for invalid access. Downside is in structure all fields are in native byte order and needs to be converted to big-endian in pickle/unpickle.
I've updated pure code to latest datetime_ns version with correct repr, str and updated tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I realize there's a bit of a problem with how the interface looks to an end user.
If I want to call datetime
passing tzinfo
by position and include nanoseconds, I'd need to do something like this:
dt = datetime(2020, 12, 31, 12, 59, 59, 0, UTC, nanosecond=999_999_999)
That's just weird, and it's unclear to me (without having looked at the constructor code), what happens if microseconds
is a value other than 0
.
I could imagine a lot of people thinking that the allowed range for nanoseconds
is [0, 1000]
, and constructing their datetime
s like this:
dt = datetime(2020, 12, 31, 59, 59, 999_999, UTC, nanosecond=999)
I think the best ways to handle this (and neither is amazingly good) would be to either have nanosecond
in the range of [0, 999]
in all the public-facing semantics (internally we'd still want to use nanosecond
as the canonical third struct element) or to change datetime
to accept an explicit None
for microsecond
and ensure that exactly one of nanosecond
and microsecond
are specified.
@@ -1502,32 +1554,36 @@ def dst(self): | |||
return offset | |||
|
|||
def replace(self, hour=None, minute=None, second=None, microsecond=None, | |||
tzinfo=True, *, fold=None): | |||
tzinfo=True, *, nanosecond=None, fold=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. It's really annoying that we have to do it this way for backwards compatibility reasons.
I also wonder if people will think that this should work like:
>>> dt = dt.replace(microsecond=1, nanosecond=5)
>>> dt.nanosecond
1005
I guess there's nothing to be done about it, though. ☹
"""Return a new time with new values for the specified fields.""" | ||
if hour is None: | ||
hour = self.hour | ||
if minute is None: | ||
minute = self.minute | ||
if second is None: | ||
second = self.second | ||
if microsecond is None: | ||
microsecond = self.microsecond | ||
if nanosecond is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to address the case where the user specifies both microsecond
and nanosecond
here.
if microsecond is not None and nanosecond is not None:
raise ValueError("Only one of microsecond and nanosecond may be specified.")
I guess you did that for datetime
but not for time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I don't like gratuitous introduction of floating point numbers, I think in this case it is best to allow fractional number of microseconds in the datetime constructor. (Accepting numbers.Rational). We can still add nanoseconds keyword to timedelta
where all keywords already support passing floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did discuss this at the CPython core dev sprint, in the #stdlib
room. A relevant excerpt from the conversation (@gpshead and @gvanrossum, please let me know if you'd like me to remove the transcript and replace it with a summary):
[2020-10-23 15:40:18] pganssle: Yeah. The API is always going to be a little awkard, particularly because of the positional arguments (which we can't change), but I think "can't specify both at once" is probably the safest.
[2020-10-23 15:41:36] pganssle: It might also make sense to accept a float formicrosecond
as an alternate way to specify nanoseconds, but that could always be added later if we decide it's desirable.
[2020-10-23 21:51:19] gpshead (Greg): accepting a float for microsecond on input is an option. a double has enough digits to be precise to nanoseconds within a 0.0 - 999_999_999.999 range. i'd just consider it a convenience though rather than a good primary api.
[2020-10-23 21:54:41] gvanrossum: Hm. To me, "float" time implies "seconds". Would hate to lose that for such a minor convenience.
[2020-10-23 21:55:49] gvanrossum: (It's not an important principle, but a handy debugging heuristic -- "hey, these times are floats; they probably meant seconds")
[2020-10-23 21:57:25] gpshead (Greg): yeah, it'd be awkward to specify a fractional component of something as a range restricted float.
[2020-10-23 22:02:05] pganssle: I just meant that you could passmicrosecond=100.145
to be the same asmicrosecond=None, nanosecond=100_145
.
[2020-10-23 22:02:29] pganssle: The return value of.microsecond
would still be100
and.nanosecond
would be100145
.
[2020-10-23 22:03:52] pganssle: The biggest problem I have with that as a convenience API would be that it's mostly unnecessary and also it's got a non-obvious precision truncation going on.datetime(2020, 1, 1, microsecond=100.145)
would be the same asdatetime(2020, 1, 1, microsecond=100.14577893940)
, even if those are two distinct floats.
[2020-10-23 22:18:17] gvanrossum: Still -1.0 on that.
[2020-10-23 22:21:04] gpshead (Greg): yeah agreed now that i see it. (-1)
It seems that the rough consensus on this is that there's nowhere to go from here that isn't awkward or problematic in some way, but people prefer microsecond
and nanosecond
to be mutually exclusive ints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GvR is right (no surprise!) - we should allow float (rational) seconds instead of microseconds and disallow microseconds when second is passed as noninteger. time(12, 0, 59.123456789)
looks much better than time(12, 0, 59, 123456.789)
.
I would even argue that positional microsecond argument can be deprecated or at least discouraged in the future in favor of rational seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more annoying than it's worth to deprecate microsecond
here (and in the C API!), because there's probably a lot of code out there specifying microseconds that would need to be changed for something so trivial. Otherwise I agree with the path forward: accept floats for second
.
The remaining questions:
- Do we still add a
nanosecond
kwarg? Clearly it would be similarly discouraged but people don't usually pay attention to us, but they will notice that there's amicrosecond
but nonanosecond
, and they won't understand why. I definitely foresee adding nanosecond support without ananosecond
kwarg to be greeted with complaints and confusion. - Do we still allow floats/rationals on
microsecond
, or are we going to say "fractional seconds or specifymicrosecond
or specifynanosecond
, but only one of the three"? I'm OK with either but leaning towards the second. - If we do this, will it / should it affect the exostomg C API? I'm thinking no, we should simply add an additional / additional constructors (API to be specified later, but probably along the same lines — one that takes a
double
forsecond
and/or one that takes aNanosecond
positional argument).
Another way is to replace 'microsecond' positional parameter with 'subsecond' one and have optional resolution, defaulting to 10e6 (or 'us'). |
@shramov @abalkin Do either of you have thoughts about what is still blocking / needs to be done on this? Beta freeze for Python 3.10 is coming up at the beginning of May and I think we may have enough time to get this in before then. Probably would have been better to get it into an alpha release, but if we miss beta freeze it'll get pushed to 3.11, and I do think that nanosecond support is a desirable feature for a lot of people. It might be good for us to get an explicit "to-do" list of concerns to be addressed before this can be merged. |
Seems like this PR has gone stale. I've been following it for a good while now. I'm very interested in seeing this feature in a future Python version and I'm happy to help out on this. From reading the whole thread it looks like the next step is to decide what the python only API for this should look like. The most promising candidate (to me anyway) seems to be
to avoid confused users, even if this results in a awkward looking parameter order and
plus encourage the usage of fractional seconds over the other two. Again, I'm happy to help with the implementation, I'm just not sure how to get started. |
This patch adds nanoseconds support to pure python datetime.py module as suggested in some old comment.
Also it's far from being complete since there are some open questions:
However I think it's worth a try to revive this issue.
https://bugs.python.org/issue15443