-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exploration with the timezone trait #830
Conversation
(To be explicit: in order to get through my GitHub notifications, I'll ignore draft PRs unless you specifically ping me on something you'd like feedback on.) |
src/timezone_alternative_b.rs
Outdated
// this is nice because it avoids the type paramter. | ||
// alternatively the `TimeZoneManager` could have an assocaited `TimeZone` ZST or equivalent that this is parametized by | ||
#[derive(Clone, Copy)] | ||
pub struct DateTime { | ||
datetime: NaiveDateTime, | ||
offset: FixedOffset, | ||
// could potentially include some information on the timezone name here to allow `%Z` style formatting | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we have a DateTime
instance with this implementation, and want to add 24hs to it. Is it possible to keep DST in mind without a reference to the original TimeZone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one option would be having a function on the TimeZoneManager
that 'normalises' or 'localises' the offset, but the main way the API is intended to be used is something like, my_timezone_manager.add(my_datetime, Duration::from_secs(24 * 60 * 60))
Potentially another option that could be more ergonomic is:
my_datetime.add(&my_timezone_manager, Duration::from_secs(24 * 60 * 60))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm convinced by either approach; the Datetime is essentially losing its TimeZone implementation and instance, and application code needs to keep both around for any kind of computation.
I think alternative a is a lot cleaner in that sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair call. We could do the following in the TimeZoneManager
style:
have the offset field be a type parameter, with
pub struct DateTime<Offset=FixedOffset> {
datetime: NaiveDateTime,
offset: Offset,
}
then when using a given TimeZoneManager
it would operate on DateTime
s with the relevant Offset
type. This ties them together and either discourages or prevents adding time without the associated timezone information to update the offset. (Even in the prevent option, users could simply extract the NaiveDateTime
or change the Offset
if desired)
The extra benefit of making Offset
a type parameter is that in the case of a named timezone, it can store the name internally, similar to chrono_tz
's offset type.
Good call @djc - I'll make sure to tag you in future on potential PRs like this. This is very much at the investigation stage, and I think it will still take a while from here to get to a really great solution like the use of Having had a few more days to think about it, I'm leaning toward the option of pulling the timezone totally out of the
|
If the goal is to get allow the user to make caching decisions we could still have the timezone be part of the Right now it's not very clear to me what problem this PR is trying to solve. |
4497b6b
to
60cbdcb
Compare
Fair call - this is very exploratory at the moment, and the bar is quite high for any change to be made at all to My main goal here is to explore some options for the following:
One option here (my preferred) is that we don't provide a caching strategy at all - for most use cases this won't matter anyway, and for use cases where it does, it is likely that we can't predict the best caching strategy, so potentially it's better we give those users an API that enables them to do their own caching strategy |
I'm sceptical of the assertion that "for most use cases this won't matter anyway", and I doubt that there is a way to verify that, although measuring the performance impact of the different pieces would be a decent start. I think it could make sense to make things more composable but I think we should make sure that (a) the default ergonomics don't regress too much and (b) the default performance doesn't regress too much. Alternatively, the yard stick by which we're measured here is libc's |
My biggest gripe with alternative B, is that timezone-aware datetimes would no longer have a timezone, only the offset. So most APIs that deal with them need to pass around a tuple of I think the other approach also sounds more intuitively correct. |
#[derive(Clone, PartialEq, Eq)] | ||
pub struct Transition { | ||
// a given transition time, similar format to tzinfo, | ||
// including the Utc timestamp of the transition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is stored in local time (not UTC time) for VTIMEZONE components in the icalendar spec. For a Time Zone Component...
[...] the mandatory "DTSTART" property gives the effective onset date
and local time for the time zone sub-component definition.
"DTSTART" in this usage MUST be specified as a date with a local
time value.
(context: https://datatracker.ietf.org/doc/html/rfc5545#section-3.6.5 )
Does the TZDB use a local time or UTC? I'm mostly trying to figure out which of both approaches (local vs UTC) would make operating with this easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked tables look similar to the ones available here. AFAIK the tzinfo files are generated from these text files, and then when stored in tzinfo, UTC timestamps are used.
However, an implementer of TimeZoneManager
could store two versions, one set in UTC timestamps and one in local timestamps, which could improve the lookup performance for the *_local
methods
I've no strong opinion either way between A and D. |
Thanks @djc and @WhyNotHugo for the extra commentary, its also made me realize that I probably haven't been precise enough when talking about caching.
What I meant here is that the That being said, the design using In option
This is a good call, hopefully this can be solved by caching the
This makes sense - then once I added an |
I've included some examples below of using the // this is fallible as it may encounter an error when asking the filesystem,
// or when parsing the tzinfo file or TZ variable
// we store this in an Arc so it can be shared across threads, all methods require only `&self`
let my_zone = Arc::new(TimeZoneManager::local()?); // reads from /etc/localtime or the TZ env variable
let now = my_zone.now();
let later = my_zone.add(now, Duration::from_secs(600));
let really_later = my_zone.add_days(now, Days::new(45)); but also let my_custom_zone = TimeZoneManager::from_iana("Australia/Brisbane")?;
let now = my_zone.now();
let later = my_zone.add(now, Duration::from_secs(600));
let really_later = my_zone.add_days(now, Days::new(45)); |
@esheppa I have made a couple of attempts to understand your PR, but am still not there yet. One of your goals is to explore whether the What you arrived at is:
Just writing something to get the conversation started. |
A second issue is that currently an object implementing impl<Tz: TimeZone> DateTime<Tz> {
fn checked_add_signed();
fn checked_add_months();
fn checked_add_days();
} Suppose someone had thousands of In such a loop or function it would be nice if you could hold on to the That is one of the things you explore with Adding methods to the |
Is controlling caching and handling OS errors something that can't be done today? (General thought, not your work in this PR.) In that case standalone functions can also access that static or thread-local, be used for initialization with error handling, or to set a caching stategy? |
Chrono seems like a fundamental enough crate to require the semver trick to avoid ecosystem-wide breakage with the release of 0.5: there should be a final release of the 0.4.x series re-implementing its types by depending on the types of 0.5.x. And traits should be implemented for the updated traits in 0.5.x. So, for every functionality the current |
To ensure the upgrade can be made, maybe it is best to do experiments to make an object-safe trait with a Implement |
Looking at another part of your proposals: A related query would require changes to the It is currently completely unavailable. I can imagine use cases and optimizations where this information would help. |
@pitdicker I think an issue with your approach is that the |
@WhyNotHugo No worries, I am just trying to understand the various aspects of @esheppa's work here, and don't have any proposal (yet). |
Thanks @pitdicker for all the feedback. I'm keen to continue this conversation, but I've tried to address many of the current questions below -
This is an issue with the current API - due to some methods being reasonably infallible (eg the offset and
Ideally we would embed as little as possible within a
Yes the
I'm uncomfortable with our current strategy of caching with the
This is worth considering, and could well be something that we use, but I'm keen to seek out the best possible design first, then assess what changes might be made to that to minimize ecosystem breakage
Yes, this is implemented (eg There are essentially three main things I'm trying to get at with this redesign:
|
This is unrelated and half-forgotten memories, but maybe useful. When a number of years ago I helped with the rand crate we had all kinds of thoughts around seeding and handling OS errors, making/keeping the
And there was the even more convenient At some point we wanted to steer users to have error handling for obscure OS scenarios, decide whether they want to reseed it (comparable to caching a Thet learned me But a barely interesting cleanup, abstracting over the various OS API's to get a random number, turned out to be the one of the more useful part for users (getrandom). No PRNG at all, no thread-local and stuff, no great performance... Just get a couple of random bytes from the OS.
This is a long-winded way of saying: having a way to handle errors, handling caching, object-safety, and performance are worthy goals. But wherever this goes, the majority of users don't want to think about where the timezone data comes from at all. Using a crate besides chrono is already a huge ask. And maybe what many users want for timezones could just be: whether it is Unix, Windows, iOS, WASM, whether we can use system-provided files or ship them in the binary... Just get the timezone data, and preferably let it be actual. |
I am trying to make a list of situations where caching matters, to make the discussion more concrete. But I am new to most of this so please correct me where wrong. Change of timezoneWith Thinking as a user who is unaware how computers work, I would expect that at the moment the clock changes, all running applications follow suit. What should happen to a This would mean that:
Change of timezone dataThis becomes relevant when some country decides, maybe even just a few days before the event, to change something about its timezone. Maybe change the day of a DST transition or something. Next the IANA database is updated. Microsoft releases a Windows update with the timezone data. Linux package managers push an update, And chrono just may be part of a long running application, and should refresh its timezone information with the new OS-provided sources (if any). Now the trouble starts. Suppose the Netherlands stop with DST in 2024 (to keep it close to my home). But I already created a future But at what point would it be expected for a library user the offset is updated? I don't think it is realistic to expect a library user to build something that gets notified when the timezone data is updated, that then goes through all I would say that any application dealing with future dates in a local timezone should be aware of the uncertainty, and should call Keeping timezone data actualChrono does the interaction with the OS to get the information for The main concern around caching in this PR is the caching of tzinfo files. |
What I definitely like about your explorations are, with some modifications:
|
Just leaned that an extension trait is not necessary, it is enough to add pub trait TimeZone {
// Essential to construct a `DateTime`
fn offset_at_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<FixedOffset>;
fn offset_at_utc_datetime(&self, utc: &NaiveDateTime) -> FixedOffset;
// Get info on transitions.
fn closest_transitions(&self, local: &NaiveDateTime)
-> (Option<NaiveDateTime>, Option<NaiveDateTime>);
// Support %Z formatting.
fn abbrevation(&self) -> Option<&str> {
None
}
// Constructors
fn from_utc_datetime(&self, utc: &NaiveDateTime) -> DateTime<Self> where Self: Sized {
/* ... */
}
/* all other methods */
} That would lower the friction of updating to chrono 0.5. I am sold 😄 |
Today I finally 'got' what is probably a core idea of your proposals: adding an optional You are then forced the create an instance of that As I am starting to understand it, this is planning for functionality very few users are going to use, if any. Would that small number of users not be able to avoid the relevant methods on
This is the issue that a number of methods that currently return a Sorry to sound negative. |
|
@esheppa You seem to have already solved the size issue. Looks like I am slowly retracing the steps to your design 😆: It is possible to include the offset in the type implementing Again with a bit of my own twist: pub struct DateTime<Tz: TimeZone> {
datetime: NaiveDateTime,
zone: Tz,
}
pub trait TimeZone {
// Primary methods, but not intended for library users
fn update_offset_at_utc(&mut self, utc: &NaiveDateTime);
fn update_offset_at_local(&mut self, local: &NaiveDateTime, alternative: &mut Self) -> LocalResult<NaiveDateTime>;
// Needed to construct a `DateTime`
fn from_utc_datetime(&self, utc: &NaiveDateTime) -> DateTime<Self> where Clone + Self: Sized {
let mut zone = self.clone();
zone.update_offset_at_utc(utc);
DateTime { datetime: utc, zone }
}
fn from_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<DateTime<Self>> where Self: Clone + Sized
let mut zone = self.clone();
let mut zone2 = zone.clone();
match zone.update_offset_at_local(local, &mut zone2) {
LocalResult::Single(_) => LocalResult::Single(DateTime { datetime: utc, zone }),
LocalResult::Ambiguous(dt1, dt2) => {
LocalResult::Ambiguous(DateTime { datetime: dt1, zone }, DateTime { datetime: dt2, zone2 })
}
LocalResult::InGap(transition, _) => {
LocalResult::InGap(DateTime { datetime: transition, zone }, DateTime { datetime: transition, zone2 })
}
}
}
// Needed to extract the offset from `DateTime.zone`
fn offset(&self) -> FixedOffset;
// Get info on transitions.
fn closest_transitions(&self, local: NaiveDateTime)
-> (Option<NaiveDateTime>, Option<NaiveDateTime>);
// Support %Z formatting.
fn abbrevation(&self) -> Option<&str> {
None
}
/* all other methods */
}
pub enum LocalResult<T> {
Single(T),
Ambiguous(T, T),
InGap(T, T),
}
As nice as using a ZST inside |
As a comment on other
|
This PR seems to have served its purpose for exploring changes to the |
No description provided.