-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Serialize datetime #5274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize datetime #5274
Conversation
| if o.tzinfo is None or o.tzinfo.utcoffset(o) is None: | ||
| # Otherwise, the deserialized object may change depending on local timezone. | ||
| raise TypeError( | ||
| "Can only serialize 'aware' datetime objects with `tzinfo`. " | ||
| "Consider using e.g. `datetime.datetime.now(tz=datetime.timezone.utc)`" | ||
| ) | ||
|
|
||
| if o.tzinfo != datetime.timezone.utc: | ||
| # Saves us the trouble of having to serialize the timezone. | ||
| raise TypeError("Can only serialize UTC timestamps.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be necessary, because the timestamp() method takes into account the timezone (or the locale timezone for naive datatimes) to convert to a unix epoch timestamp (which itself has no timezone).
In [12]: now, now_utc = datetime.datetime.now(), datetime.datetime.now(tz=datetime.timezone.utc)
In [13]: now
Out[13]: datetime.datetime(2022, 4, 20, 11, 52, 22, 976460)
In [14]: now_utc
Out[14]: datetime.datetime(2022, 4, 20, 18, 52, 22, 976471, tzinfo=datetime.timezone.utc)
In [15]: now.timestamp()
Out[15]: 1650480742.97646
In [16]: now_utc.timestamp()
Out[16]: 1650480742.976471There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if you assert now == now_utc it will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you had
ts = now().in_my_local_timezone()
assert ts == read_json(to_json(ts))the assertion would fail, even though you're still referring to the correct point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While round trip equality is nice, I think datetime is generally broken with how it handles naive vs tz-aware instances and so we shouldn't expect roundtrip equality (even two tz-aware datetime instances with the same timestamp in different timezones do not compare equal). So I suggest we support serializing any datetime by getting its timestamp, and then we just have to pick whether to deserialize as naive or tz-aware datetimes. I think tz-aware with utc is ok, though I would probably prefer naive for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the idea of specifying the tz in deserialization was to force comparison by timestamp(). In a quick test, it seemed like I could reasonable compare two tz-aware datetimes. Riffing on mpharrigan@'s example above:
>>> ts = datetime.datetime.now(tz=timezone('US/Pacific'))
>>> ts == datetime.datetime.fromtimestamp(ts.timestamp(), tz=datetime.timezone.utc)
True
Also...
>>> ts = datetime.datetime.now()
>>> ts.timestamp() == datetime.datetime.fromtimestamp(ts.timestamp(), tz=datetime.timezone.utc).timestamp()
True
That all seems reasonable to me, so I'd +1 to allowing any input datetime here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. Thanks for investigating @wcourtney . It looks like equality does work for "aware" datetimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( I removed the utc check during serialization)
| # unix timestamps should mean that the deserialized datetime should refer to the | ||
| # same point in time but may not satisfy o = read_json(to_json(o)) because the actual | ||
| # timezones, and hour fields will not be identical. | ||
| return datetime.datetime.fromtimestamp(timestamp, tz=datetime.timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conflicted about whether it's better to use naive datetime or explicit utc timezone here. I generally find that using naive datetimes is simpler, as long as you use timestamp() for any kind of comparison, but I could be convinced otherwise.
In [17]: ts = 1650480742
In [18]: dt = datetime.datetime.fromtimestamp(ts)
In [19]: dt_utc = datetime.datetime.fromtimestamp(ts, tz=datetime.timezone.utc)
In [20]: dt
Out[20]: datetime.datetime(2022, 4, 20, 11, 52, 22)
In [21]: dt_utc
Out[21]: datetime.datetime(2022, 4, 20, 18, 52, 22, tzinfo=datetime.timezone.utc)
In [22]: dt.timestamp()
Out[22]: 1650480742.0
In [23]: dt_utc.timestamp()
Out[23]: 1650480742.0
One big problem with the datetime library IMO is that it has a function datetime.datetime.utcnow() which ideally would be the same as datetime.datetime.now(tz=datetime.timezone.utc) but which in fact returns a "naive" datetime with no tz but whose calendar fields (year, month, day, hour, min, second) agree with the current utc time. I find this incredibly confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the docs for utcnow has a red box telling you not to use it.
naive datetimes is simpler, as long as you use timestamp() for any kind of comparison
I figured it might be worth forcing folks to use the explicit thing so they can't accidentally not use timestamp(). I mean: another alternative would be to use timestamp floats instead of datetime.datetime everywhere.
the other problem is that it's really hard to use datetime.__eq__ in a meaningful way with naive datetimes. I guess that falls under "comparison".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More explicit than timestamp floats would be timestamp protos, maybe made easier by proto-plus utilities? I personally prefer that, but my hunch is that it's better for the library to support datetime natively.
For usability of datetime, +1 to forcing users to use .timestamp() by setting the explicit tz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive datetimes compare fine to other naive datetimes, and I think I'd lean toward using them when deserializing for simplicity. But this is definitely an area where footguns abound. We have switched over to using a Timestamp internally that is just a wrapper around unix epoch timestamps to avoid all these issues with datetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More explicit than timestamp floats would be timestamp protos, maybe made easier by
proto-plusutilities? I personally prefer that, but my hunch is that it's better for the library to supportdatetimenatively.For usability of
datetime, +1 to forcing users to use.timestamp()by setting the explicit tz.
You can call .timestamp() on any datetime instance, whether it has a tz or not, so I don't think that setting it forces users to do the right thing.
I think timestamp protos are going to feel out of place in most python code and we don't currently pull in protobuf as a dependency of cirq-core, so probably shouldn't rely on them unless we want to support time serialization only in cirq-google.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the basis that:
- the proto fields use an analog of utc "aware" datetimes
- things seem generally less broken if you live entirely in "aware" datetime land
I'd like to go ahead with forcing 'aware' during serialization and always deserializing as "aware" utc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proto uses a timestamp (which can be gotten from naive or aware datetime), so I don't think that carries much weight here. But I'm fine with deserializing as aware and and accepting only aware datetimes (could decide to accept naive datetimes in the future if we want). Thanks for working through all the comments here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through this! I'm a fan of being more explicit, especially WRT time since it's often hard enough to reason about even w/o Python's fun tz handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about variable names in tests, otherwise LGTM.
|
|
||
|
|
||
| def test_basic_time_assertions(): | ||
| naive_ts = datetime.datetime.now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use the suffix _dt instead for datetime instances in these tests? _ts sounds like "timestamp", and I think we should be careful about the distinction between a datetime and a timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Automerge cancelled: A required status check is not present. Missing statuses: ['Coverage check', 'Pytest MacOS (3.7)', 'Pytest MacOS (3.8)', 'Pytest MacOS (3.9)', 'Pytest Ubuntu (3.7)', 'Pytest Ubuntu (3.8)', 'Pytest Ubuntu (3.9)', 'Pytest Windows (3.7)', 'Pytest Windows (3.8)', 'Pytest Windows (3.9)', 'Typescript lint check', 'Typescript tests', 'Typescript tests coverage'] |
spun off from quantumlib#5152 It's super important when running experiments to keep track of when things happen. cc @maffoo
spun off from quantumlib#5152 It's super important when running experiments to keep track of when things happen. cc @maffoo
spun off from quantumlib#5152 It's super important when running experiments to keep track of when things happen. cc @maffoo
spun off from #5152
It's super important when running experiments to keep track of when things happen.
cc @maffoo