Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exploration with the timezone trait #830

Closed
wants to merge 3 commits into from
Closed

Conversation

esheppa
Copy link
Collaborator

@esheppa esheppa commented Oct 1, 2022

No description provided.

@djc
Copy link
Member

djc commented Oct 3, 2022

(To be explicit: in order to get through my GitHub notifications, I'll ignore draft PRs unless you specifically ping me on something you'd like feedback on.)

Comment on lines 11 to 18
// this is nice because it avoids the type paramter.
// alternatively the `TimeZoneManager` could have an assocaited `TimeZone` ZST or equivalent that this is parametized by
#[derive(Clone, Copy)]
pub struct DateTime {
datetime: NaiveDateTime,
offset: FixedOffset,
// could potentially include some information on the timezone name here to allow `%Z` style formatting
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume we have a DateTime instance with this implementation, and want to add 24hs to it. Is it possible to keep DST in mind without a reference to the original TimeZone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one option would be having a function on the TimeZoneManager that 'normalises' or 'localises' the offset, but the main way the API is intended to be used is something like, my_timezone_manager.add(my_datetime, Duration::from_secs(24 * 60 * 60))

Potentially another option that could be more ergonomic is:
my_datetime.add(&my_timezone_manager, Duration::from_secs(24 * 60 * 60))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm convinced by either approach; the Datetime is essentially losing its TimeZone implementation and instance, and application code needs to keep both around for any kind of computation.

I think alternative a is a lot cleaner in that sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair call. We could do the following in the TimeZoneManager style:

have the offset field be a type parameter, with

pub struct DateTime<Offset=FixedOffset> {
    datetime: NaiveDateTime,
    offset: Offset,
}

then when using a given TimeZoneManager it would operate on DateTimes with the relevant Offset type. This ties them together and either discourages or prevents adding time without the associated timezone information to update the offset. (Even in the prevent option, users could simply extract the NaiveDateTime or change the Offset if desired)

The extra benefit of making Offset a type parameter is that in the case of a named timezone, it can store the name internally, similar to chrono_tz's offset type.

@esheppa
Copy link
Collaborator Author

esheppa commented Oct 4, 2022

Good call @djc - I'll make sure to tag you in future on potential PRs like this. This is very much at the investigation stage, and I think it will still take a while from here to get to a really great solution like the use of TimeDelta as an output type only.

Having had a few more days to think about it, I'm leaning toward the option of pulling the timezone totally out of the DateTime type as this allows the user to make a caching decision which is most suitable for their use cases:

  • DateTime in this option is equivalent to DateTime<FixedOffset> and can provide the same use as DateTime<Utc> as well
  • Where we use Local currently, we cache the file in thread_local and potentially hit the file system for something as simple as my_datetime + my_duration, whereas this API makes the costs more explicit - you have to have the TimeZoneManager before you can do calculations with the local time
  • This also opens the way for using the full set of local tzinfo files to offer similar functionality as chrono_tz while getting updates from the OS package manager

@djc
Copy link
Member

djc commented Oct 5, 2022

If the goal is to get allow the user to make caching decisions we could still have the timezone be part of the DateTime type, but provide different types with different caching strategies?

Right now it's not very clear to me what problem this PR is trying to solve.

@esheppa
Copy link
Collaborator Author

esheppa commented Oct 5, 2022

Right now it's not very clear to me what problem this PR is trying to solve.

Fair call - this is very exploratory at the moment, and the bar is quite high for any change to be made at all to TimeZone as it is currently working well.

My main goal here is to explore some options for the following:

  • Improve clarification of the Offset and TimeZone traits - this can be confusing at the moment because the TimeZone can be created from the Offset, but the Offset types also implement TimeZone. Option A solves this by only having FixedOffset as an offset, whereas option C more clearly delineates the Offset (a small type stored in the DateTime, from the TimeZone(Manager) which represents the tzinfo file or equivalent data). This is especially relevant with the recent questions surrounding Utc and FixedOffset.
  • Allow users to do caching and handle things like filesystem errors, eg when parsing a tzinfo file. This also enables users to get their tzinfo files from a remote server if desired. (Only options B and C allow this)
  • Have a consistent place to store the timezone name for use in %Z (related to add LocalOffset type that stores the timezone name #750)
  • Return more data from _local methods (related to Consider new helper methods for LocalResult #716)

but provide different types with different caching strategies?

One option here (my preferred) is that we don't provide a caching strategy at all - for most use cases this won't matter anyway, and for use cases where it does, it is likely that we can't predict the best caching strategy, so potentially it's better we give those users an API that enables them to do their own caching strategy

@djc
Copy link
Member

djc commented Oct 6, 2022

One option here (my preferred) is that we don't provide a caching strategy at all - for most use cases this won't matter anyway, and for use cases where it does, it is likely that we can't predict the best caching strategy, so potentially it's better we give those users an API that enables them to do their own caching strategy

I'm sceptical of the assertion that "for most use cases this won't matter anyway", and I doubt that there is a way to verify that, although measuring the performance impact of the different pieces would be a decent start. I think it could make sense to make things more composable but I think we should make sure that (a) the default ergonomics don't regress too much and (b) the default performance doesn't regress too much.

Alternatively, the yard stick by which we're measured here is libc's localtime_r(), which does have some form of caching, right? So from that perspective too, I don't think we can get away with just not providing a caching strategy by default.

@WhyNotHugo
Copy link

My biggest gripe with alternative B, is that timezone-aware datetimes would no longer have a timezone, only the offset.

So most APIs that deal with them need to pass around a tuple of (Datetime, TimeZone). And if any API doesn't return the timezone, then that datetime has lots some of its data, and any computations (e.g.: substract 24hs) can no longer be done accurately (it would fail on DST and other transitions).

I think the other approach also sounds more intuitively correct.

#[derive(Clone, PartialEq, Eq)]
pub struct Transition {
// a given transition time, similar format to tzinfo,
// including the Utc timestamp of the transition,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is stored in local time (not UTC time) for VTIMEZONE components in the icalendar spec. For a Time Zone Component...

[...] the mandatory "DTSTART" property gives the effective onset date
and local time for the time zone sub-component definition.
"DTSTART" in this usage MUST be specified as a date with a local
time value.

(context: https://datatracker.ietf.org/doc/html/rfc5545#section-3.6.5 )

Does the TZDB use a local time or UTC? I'm mostly trying to figure out which of both approaches (local vs UTC) would make operating with this easier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked tables look similar to the ones available here. AFAIK the tzinfo files are generated from these text files, and then when stored in tzinfo, UTC timestamps are used.

However, an implementer of TimeZoneManager could store two versions, one set in UTC timestamps and one in local timestamps, which could improve the lookup performance for the *_local methods

@WhyNotHugo
Copy link

I've no strong opinion either way between A and D.

@esheppa
Copy link
Collaborator Author

esheppa commented Oct 10, 2022

Thanks @djc and @WhyNotHugo for the extra commentary, its also made me realize that I probably haven't been precise enough when talking about caching.

One option here (my preferred) is that we don't provide a caching strategy at all - for most use cases this won't matter anyway ...

I'm sceptical of the assertion that "for most use cases this won't matter anyway" ...

What I meant here is that the TimeZoneManager type, if used, wouldn't need to have any caching internally as it shouldn't be re-created frequently, but instead created either once at the start of the application (for example in a OnceCell) or periodically as the local tzdb is updated by the system package manager (eg via a tokio::sync::watch). Then all of the &self methods can be used to do operations on different datetimes which means the TimeZoneManager can be shared across threads etc.

That being said, the design using TimeZoneManager only is probably too much of a departure from the existing design and so could create other unforseen problems - however I'm keen to try to find an API that is compatible with that design allowing it to be potentially present in addition to a more standard API.

In option A and D the primary API provided is quite similar to the current TimeZone API and so shouldn't provide too many problems for migration. The extra benefit of option D is by using a marker trait which enables the add/sub operations, etc, allowing for potential TimeZone implementors to not implement that marker trait and hence require use of TimeZoneManager - my envisaged use case for this is using files from /usr/share/zoneinfo/ or equivalent.

Alternatively, the yard stick by which we're measured here is libc's localtime_r(), which does have some form of caching, right? So from that perspective too, I don't think we can get away with just not providing a caching strategy by default.

This is a good call, hopefully this can be solved by caching the TZ variable in the existing Local implementation

My biggest gripe with alternative B, is that timezone-aware datetimes would no longer have a timezone, only the offset.

This makes sense - then once I added an Offset style trait for option C I realized that it wasn't sufficiently different from the existing option A so just made a combination option in D instead.

@esheppa
Copy link
Collaborator Author

esheppa commented Oct 22, 2022

I've included some examples below of using the TimeZoneManager style. The main advantage here is to allow for things like filesystem errors to be handled upfront, rather than potentially on every usage of DateTime<Tz> + Duration etc. As long as the Zone associated type doesn't implement the EnableDirectOpsImpls marker trait, this forces users to use the methods on TimeZoneManager and avoids the hidden fallibility. It also allows users to cache the TimeZoneManager in an Arc, or elsewhere depending on their requirements.

// this is fallible as it may encounter an error when asking the filesystem, 
// or when parsing the tzinfo file or TZ variable
// we store this in an Arc so it can be shared across threads, all methods require only `&self` 
let my_zone = Arc::new(TimeZoneManager::local()?); // reads from /etc/localtime or the TZ env variable
let now = my_zone.now();
let later = my_zone.add(now, Duration::from_secs(600));
let really_later = my_zone.add_days(now, Days::new(45));

but also

let my_custom_zone = TimeZoneManager::from_iana("Australia/Brisbane")?;
let now = my_zone.now();
let later = my_zone.add(now, Duration::from_secs(600));
let really_later = my_zone.add_days(now, Days::new(45));

@pitdicker
Copy link
Collaborator

pitdicker commented Apr 27, 2023

@esheppa I have made a couple of attempts to understand your PR, but am still not there yet. One of your goals is to explore whether the TimeZone trait can be made object-safe (related to #432). That could be nice, it would increase the possibilities to write generic code with DateTimes. To understand your work better I also gave that a try, but couldn't manage it 😄.

What you arrived at is:

  • All constructors must not be in the TimeZone trait, but on other types such as NaiveDateTime, or DateTime<Tz>::from_*. That feels natural to me.
  • A timezone object must be embedded inside DateTime. If the type implementing TimeZone does not-trivial stuff this will probably be something pointer-sized like an Arc.
  • You didn't embed an offset in the DateTime, but wouldn't that be necessary to keep reasonable performance?

Just writing something to get the conversation started.

@pitdicker
Copy link
Collaborator

pitdicker commented Apr 28, 2023

A second issue is that currently an object implementing TimeZone is recreated whenever the timestamp of a DateTime changes, to recalculate the correct offset. I.e. for methods such as:

impl<Tz: TimeZone> DateTime<Tz> {
	fn checked_add_signed();
	fn checked_add_months();
	fn checked_add_days();
}

Suppose someone had thousands of DateTimes in a Vec, and wants to change them all. For every item the same TimeZone object had to be recreated. And if this is not some simple type but has to do system calls or read a file, that can get expensive. And in any case recreating the TimeZone object each time seems wasteful.

In such a loop or function it would be nice if you could hold on to the TimeZone object, and use that every time to do calculations on the DateTimes.

That is one of the things you explore with TimeZoneManager, right?

Adding methods to the TimeZone trait to do the same can be done in a backwards compatible way I think?

@pitdicker
Copy link
Collaborator

pitdicker commented Apr 28, 2023

Is controlling caching and handling OS errors something that can't be done today?

(General thought, not your work in this PR.)
Having the type implementing Offset contain an Arc/Box/trait object does not seems very niche to me. That would blow up the size of every single DateTime that uses it. So I will assume Offset types to be zero-sized or an Enum. Then the full data must always be stored in something like a static or thread-local.

In that case standalone functions can also access that static or thread-local, be used for initialization with error handling, or to set a caching stategy?

@pitdicker
Copy link
Collaborator

pitdicker commented Apr 28, 2023

Chrono seems like a fundamental enough crate to require the semver trick to avoid ecosystem-wide breakage with the release of 0.5: there should be a final release of the 0.4.x series re-implementing its types by depending on the types of 0.5.x. And traits should be implemented for the updated traits in 0.5.x.

So, for every functionality the current TimeZone trait has, there must be some way to achieve it with the new one (or using available methods on DateTime etc.). I have not thought deep about this though.

@pitdicker
Copy link
Collaborator

To ensure the upgrade can be made, maybe it is best to do experiments to make an object-safe trait with a TimeZoneNext trait in the 0.4.x branch?

Implement Local, FixedOffset and Utc for TimeZoneNext, and use TimeZoneNext to implement TimeZone.

@pitdicker
Copy link
Collaborator

Looking at another part of your proposals:
Information about a 'gap' during a timezone shift / dst transition can be implemented with just an extra variant in LocalOffset. It would be a breaking change, but not necessarily for the TimeZone trait.

A related query would require changes to the TimeZone trait: what is the range around a DateTime where the offset stays the same? In other words, when was the previous transition, and when is the next transition? Did you implement something like that?

It is currently completely unavailable. I can imagine use cases and optimizations where this information would help.

@WhyNotHugo
Copy link

@pitdicker I think an issue with your approach is that the TimeZone type would still not be object safe. I think it is inevitable to break backwards compatibility in some way if we want to achieve that.

@pitdicker
Copy link
Collaborator

@WhyNotHugo No worries, I am just trying to understand the various aspects of @esheppa's work here, and don't have any proposal (yet).

@esheppa
Copy link
Collaborator Author

esheppa commented Apr 30, 2023

Thanks @pitdicker for all the feedback. I'm keen to continue this conversation, but I've tried to address many of the current questions below -

Is controlling caching and handling OS errors something that can't be done today?

This is an issue with the current API - due to some methods being reasonably infallible (eg the offset and DateTime constructors) but also requiring syscalls or file reading.

A timezone object must be embedded inside DateTime. If the type implementing TimeZone does not-trivial stuff this will probably be something pointer-sized like an Arc.

Ideally we would embed as little as possible within a DateTime. It may be sufficient to just hold a UTC offset and a ZST that defines the timezone, ensuring it is always Copy.

That is one of the things you explore with TimeZoneManager, right?

Yes the TimeZoneManager is to allow fallibility of syscalls/file reading to be exposed, provide a way for the library consumer to handle caching of the parsed tzinfo data and also allow the TimeZone type to be as small as possible.

So I will assume Offset types to be zero-sized or an Enum. Then the full data must always be stored in something like a static or thread-local.

I'm uncomfortable with our current strategy of caching with the thread_local!. It was a good solution to facilitate able to move to internal tzinfo parsing without too much of a performance penalty, but I think it is better to give library consumers options about how to cache, especially as they will have more knowledge on how to set cache expiry.

Chrono seems like a fundamental enough crate to require the semver trick to avoid ecosystem-wide breakage

This is worth considering, and could well be something that we use, but I'm keen to seek out the best possible design first, then assess what changes might be made to that to minimize ecosystem breakage

A related query would require changes to the TimeZone trait: what is the range around a DateTime where the offset stays the same? In other words, when was the previous transition, and when is the next transition? Did you implement something like that?

Yes, this is implemented (eg closest_transitions/closest_transitions_from_local) and is required for helper functions on the LocalResult

There are essentially three main things I'm trying to get at with this redesign:

  1. Make TimeZone object safe
  2. Allow better control of caching for tzinfo data
  3. Allow better handling of syscall errors
  4. Find a good way to handle cases where functions should be infallible due to the timezone having a permanent fixed offset, while still handling fallibility for timezones that change offset. (Currently this is somewhat the role TimeZone and Offset play)

@pitdicker
Copy link
Collaborator

pitdicker commented May 1, 2023

This is unrelated and half-forgotten memories, but maybe useful.
(I'll write a more constructive reply later.)

When a number of years ago I helped with the rand crate we had all kinds of thoughts around seeding and handling OS errors, making/keeping the RNG trait object-safe, and reseeding an PRNG (which is a questionable thing to do, but might benefit security in some cases).

ThreadRng packed that all up in a convenient package in a thread-local, with the one problem that is was seeded automatically from the OS. I don't remember if it would just block until successful, or panic.

And there was the even more convenient random() standalone function, so you wouldn't need to get a reference to the ThreadRng but just get a random number out of seemingly thin air.

At some point we wanted to steer users to have error handling for obscure OS scenarios, decide whether they want to reseed it (comparable to caching a TimeZone), which PRNG to use (a secure or regular one, and than which algorithm), and to hold on to it once initialized for better performance. So random() was deprecated. But no such luck, it might just have been the most popular part of the API and there is no chance ever to remove it 🤣.

Thet learned me random() was just what many users needed. They didn't want to think about the details of the generation of random numbers, and it wasn't super-performance-critical.

But a barely interesting cleanup, abstracting over the various OS API's to get a random number, turned out to be the one of the more useful part for users (getrandom). No PRNG at all, no thread-local and stuff, no great performance... Just get a couple of random bytes from the OS.


TimeZone seems in a pretty similar position to me. Users don't want to think deeply about dates. Stuff like Unicode and date/time is already inconvenient with way too many things that you have to do in just a certain way or otherwise there is some obscure case that will fail. And an API that steers you into the right direction can already feel annoying 😄.

This is a long-winded way of saying: having a way to handle errors, handling caching, object-safety, and performance are worthy goals. But wherever this goes, the majority of users don't want to think about where the timezone data comes from at all. Using a crate besides chrono is already a huge ask.

And maybe what many users want for timezones could just be: whether it is Unix, Windows, iOS, WASM, whether we can use system-provided files or ship them in the binary... Just get the timezone data, and preferably let it be actual.

@pitdicker
Copy link
Collaborator

pitdicker commented May 1, 2023

I am trying to make a list of situations where caching matters, to make the discussion more concrete. But I am new to most of this so please correct me where wrong.

Change of timezone

With Local the timezone can change while an application is running. This may be because a user/application sets it to a different value, or because the device physically moves to a different timezone. Like a phone or laptop on a plane.

Thinking as a user who is unaware how computers work, I would expect that at the moment the clock changes, all running applications follow suit.

What should happen to a DateTime<Local> that was made before the change? Its timestamp, offset and the name of the timezone (which is currently nor yet part of the type) should not change. Just like a DateTime from any other source should not just change because of an outside influence. If an application wants to present a DateTime<Local> to a user with the then-actual timezone, it should call with_timezone(Local) just before formatting.

This would mean that:

  • DateTime<Local> needs to store the value of the offset and some identification of the timezone.
  • Local should somehow automatically follow the OS, or within a reasonably short time.

Change of timezone data

This becomes relevant when some country decides, maybe even just a few days before the event, to change something about its timezone. Maybe change the day of a DST transition or something. Next the IANA database is updated. Microsoft releases a Windows update with the timezone data. Linux package managers push an update, /etc/localtime is updated. And hopefully the users system gets the update.

And chrono just may be part of a long running application, and should refresh its timezone information with the new OS-provided sources (if any).

Now the trouble starts. Suppose the Netherlands stop with DST in 2024 (to keep it close to my home). But I already created a future DateTime at 2024-05-01T11:00:00+2:00 in the Europe/Amsterdam timezone. This date and offset have become inconsistent with the associated timezone. I suppose the correct thing to do is update the offset, otherwise the time should have been stored as Utc or FixedOffset, and not as a local time.

But at what point would it be expected for a library user the offset is updated?

I don't think it is realistic to expect a library user to build something that gets notified when the timezone data is updated, that then goes through all DateTimes that may be stored in various data structures, and updates them. How would it even know of a database change between two runs?

I would say that any application dealing with future dates in a local timezone should be aware of the uncertainty, and should call datetime.with_timezone(datetime.timezone()) to actualize them before use. And methods such as DateTime::checked_add_signed, DateTime::checked_add_days, and DateTime::checked_add_months should internally actualize them, so at least the result is consistent with the timezone again.

Keeping timezone data actual

Chrono does the interaction with the OS to get the information for Local. In my opinion it is also responsible for keeping it up to date.

The main concern around caching in this PR is the caching of tzinfo files.

@pitdicker
Copy link
Collaborator

pitdicker commented May 1, 2023

What I definitely like about your explorations are, with some modifications:

DateTime type

pub struct DateTime<Tz: TimeZone> {
    datetime: NaiveDateTime,
    offset: FixedOffset,
    zone: Tz,
}

Advantages:

  • no need to recreate the timezone from the offset.
  • there is no longer an Offset trait and type necessary, making things clearer.

TimeZone trait

pub trait TimeZone {
    // Essential to construct a `DateTime`
    fn offset_at_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<FixedOffset>;
    fn offset_at_utc_datetime(&self, utc: &NaiveDateTime) -> FixedOffset;

    // Get info on transitions.
    fn closest_transitions(&self, local: &NaiveDateTime)
        -> (Option<NaiveDateTime>, Option<NaiveDateTime>);

    // Support %Z formatting.
    fn abbrevation(&self) -> Option<&str> {
        None
    }
}

Advantages:

  • object-safe
  • extra functionality chrono was missing before

Companion trait that doesn't have to be object-safe

These can all be convenient, but on its own this trait is not worth it. There are already other ways to accomplish each of these.

pub trait TimeZoneManager {
    type Zone: TimeZone + Clone;

    #[cfg(feature = "clock")]
    fn now() -> DateTime<Self::Zone>;

    // Current constructors that are not deprecated, but I don't care much about them:
    fn with_ymd_and_hms(
        &self,
        year: i32,
        month: u32,
        day: u32,
        hour: u32,
        min: u32,
        sec: u32
    ) -> LocalResult<DateTime<Self::Zone>>;
    fn timestamp(&self, secs: i64, nsecs: u32) -> LocalResult<DateTime<Self::Zone>>;
    fn timestamp_millis(&self, millis: i64) -> LocalResult<DateTime<Self::Zone>>;
    fn timestamp(&self, nonos: i64) -> LocalResult<DateTime<Self::Zone>>;

    // Maybe useful as an optimization, but I don't really think so
    fn add(dt: DateTime<Self::Zone>, rhs: TimeDelta) -> Result<DateTime<Self::Zone>, ChronoError>;
    fn sub(dt: DateTime<Self::Zone>, rhs: TimeDelta) -> Result<DateTime<Self::Zone>, ChronoError>;
    fn add_days(dt: DateTime<Self::Zone>, rhs: Days) -> Result<DateTime<Self::Zone>, ChronoError>;
    fn sub_days(dt: DateTime<Self::Zone>, rhs: Days) -> Result<DateTime<Self::Zone>, ChronoError>;
    fn add_months(dt: DateTime<Self::Zone>, rhs: Months) -> Result<DateTime<Self::Zone>, ChronoError>;
    fn sub_months(dt: DateTime<Self::Zone>, rhs: Months) -> Result<DateTime<Self::Zone>, ChronoError>;
}

@pitdicker
Copy link
Collaborator

pitdicker commented May 1, 2023

Just leaned that an extension trait is not necessary, it is enough to add where Self: Sized to the methods that should be unavailable when used as trait object. I.e. all constructors can just work.

pub trait TimeZone {
    // Essential to construct a `DateTime`
    fn offset_at_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<FixedOffset>;
    fn offset_at_utc_datetime(&self, utc: &NaiveDateTime) -> FixedOffset;

    // Get info on transitions.
    fn closest_transitions(&self, local: &NaiveDateTime)
        -> (Option<NaiveDateTime>, Option<NaiveDateTime>);

    // Support %Z formatting.
    fn abbrevation(&self) -> Option<&str> {
        None
    }

    // Constructors
    fn from_utc_datetime(&self, utc: &NaiveDateTime) -> DateTime<Self> where Self: Sized {
        /* ... */
    }
    /* all other methods */
}

That would lower the friction of updating to chrono 0.5. I am sold 😄

@pitdicker
Copy link
Collaborator

  1. Allow better control of caching for tzinfo data
  2. Allow better handling of syscall errors

Today I finally 'got' what is probably a core idea of your proposals: adding an optional EnableDirectOpsImpls marker trait to types implementing TimeZone. The idea is that methods such as DateTime::add_days become unavailable without the marker trait. If a TimeZone type does things that are fallible, it would not implement the trait.

You are then forced the create an instance of that TimeZone type first, dealing with the potential errors, and use that to do the operations. And you can hold on to the type, i.e. caching it.

As I am starting to understand it, this is planning for functionality very few users are going to use, if any. Would that small number of users not be able to avoid the relevant methods on DateTime themselves?

  1. Find a good way to handle cases where functions should be infallible due to the timezone having a permanent fixed offset, while still handling fallibility for timezones that change offset. (Currently this is somewhat the role TimeZone and Offset play)

This is the issue that a number of methods that currently return a LocalResult would not need to do so if the type implementing TimeZone is Utc or FixedOffset. I don't feel like this is really a thing worth solving. But maybe that is why you put it as nr. 4 on a list of 3 goals.

Sorry to sound negative.

@pitdicker
Copy link
Collaborator

  • One disadvantage of changing DateTime to include an offset is that DateTime<Utc> is no longer the same size as NaiveDateTime. It would increase from 12 to 16 bytes, both with an alignment of 4. A workaround for the cases with many dates where this is critical would be to just store NaiveDateTime.

  • One more thing to maybe encode in the DateTime is whether it is in daylight saving time, standard time, or unknown. See How to know for a date/time if daylight saving time is in effect? #235. If FixedOffset can hold some flags, like I hope to add in Preserve -00:00 offset #1042, that might be a good place to store that?

@pitdicker
Copy link
Collaborator

pitdicker commented May 3, 2023

@esheppa You seem to have already solved the size issue. Looks like I am slowly retracing the steps to your design 😆:

It is possible to include the offset in the type implementing TimeZone. The trait would need methods that take &mut Self to update the internally stored offset. I find it both cool and confusing. But maybe it is just a matter of naming and getting the right perspective?

Again with a bit of my own twist:

pub struct DateTime<Tz: TimeZone> {
    datetime: NaiveDateTime,
    zone: Tz,
}

pub trait TimeZone {
    // Primary methods, but not intended for library users
    fn update_offset_at_utc(&mut self, utc: &NaiveDateTime);
    fn update_offset_at_local(&mut self, local: &NaiveDateTime, alternative: &mut Self) -> LocalResult<NaiveDateTime>;

    // Needed to construct a `DateTime`
    fn from_utc_datetime(&self, utc: &NaiveDateTime) -> DateTime<Self> where Clone + Self: Sized {
        let mut zone = self.clone();
        zone.update_offset_at_utc(utc);
        DateTime { datetime: utc, zone }
    }
    fn from_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<DateTime<Self>> where Self: Clone + Sized
        let mut zone = self.clone();
        let mut zone2 = zone.clone();
        match zone.update_offset_at_local(local, &mut zone2) {
            LocalResult::Single(_) => LocalResult::Single(DateTime { datetime: utc, zone }),
            LocalResult::Ambiguous(dt1, dt2) => {
                LocalResult::Ambiguous(DateTime { datetime: dt1, zone }, DateTime { datetime: dt2, zone2 })
            }
            LocalResult::InGap(transition, _) => {
                LocalResult::InGap(DateTime { datetime: transition, zone }, DateTime { datetime: transition, zone2 })
            }
        }
    }

    // Needed to extract the offset from `DateTime.zone`
    fn offset(&self) -> FixedOffset;
    
    // Get info on transitions.
    fn closest_transitions(&self, local: NaiveDateTime)
        -> (Option<NaiveDateTime>, Option<NaiveDateTime>);

    // Support %Z formatting.
    fn abbrevation(&self) -> Option<&str> {
        None
    }

    /* all other methods */
}

pub enum LocalResult<T> {
    Single(T),
    Ambiguous(T, T),
    InGap(T, T),
}
  • An issue is that update_offset_at_local has no nice way to return more than one result of the type implementing TimeZone. We want to have that to provide helper functions on LocalResult to return the preferred solution as DateTime. Above I solved this by adding an extra alternative: &mut Self parameter, which is to be combined with the second result of LocalResult by the calling function (see from_local_datetime). Pretty much the ugly C way of using functions.
  • The two update_offset_at_* methods make no sense on a standalone TimeZone type.
  • The TimeZone::offset method also does not always make sense on a standalone TimeZone type, which seems like a confusing thing to misuse for users.

As nice as using a ZST inside DateTime<Utc> is, the complexity doesn't seem worth it in my opinion.

@pitdicker
Copy link
Collaborator

pitdicker commented May 3, 2023

As a comment on other TimeZone methods.

  • I don't think it is a good idea to make a trait method condition on a crate feature. Maybe related to the thing that adding a method to a trait, even a provided method, can be considered a breaking change.
    Suppose a type outside chrono that implements TimeZone has a now() method, like Utc and Local currently do. It would fail to compile if the clock feature was enabled.

  • Maybe TimeZone::name is a good addition to TimeZone::abbrevation. An abbrevation like EST doesn't exist or map 1:1 to IANA timezone names or Windows timezone names. Both are worth knowing.

@pitdicker
Copy link
Collaborator

This PR seems to have served its purpose for exploring changes to the TimeZone trait.

@pitdicker pitdicker closed this Jan 31, 2024
@pitdicker pitdicker deleted the timezone-trait-exploration branch January 31, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants