Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting the API to return Results #1049

Closed
pitdicker opened this issue May 6, 2023 · 19 comments
Closed

Converting the API to return Results #1049

pitdicker opened this issue May 6, 2023 · 19 comments
Labels
API-incompatible Tracking changes that need incompatible API revisions

Comments

@pitdicker
Copy link
Collaborator

There are now multiple issues or comments to make chrono 0.5 return Results instead of Options, and panic less. There is some design work that needs to be done first. I don't see any concrete, high-level proposals. Maybe this issue can serve for exploration.

See #815

cc @Zomtir, @udoprog

@pitdicker
Copy link
Collaborator Author

pitdicker commented May 6, 2023

Error types

As a start I'll propose an Error enum.
The current causes for errors and panics in the parts of the API I have looked add fall in the following categories:

pub enum ChronoError {
    InvalidParameter, // example: hour == 25
    Overflow, // example: year = 500_000
    DoesNotExist, // example: 31 February
    InconsistentDate(NaiveDate), // example: parsing Sunday 2023-04-21 (should be Friday)
    ParsingError(ParsingError), // collect all other reasons for failing to parse something in a nested enum.
    // Maybe:
    TzLoadingError(/* todo */), // Catchall for all errors caused by the OS in `offset::local`
}

And a specialized type for errors caused by timezone conversion:

pub enum LocalResult<T> {
    Single(T),
    Ambiguous(T, T),
    InGap(T, T),
    Error(ChronoError)
}

@pitdicker
Copy link
Collaborator Author

pitdicker commented May 6, 2023

Goals

  1. Chrono should allow writing software that handles dates and times correctly and doesn't panic.
  2. Chrono should have an API that is pleasant to use.
  3. Errors should be informative and/or actionable.

Pleasant to use

The deprecations of many potentially panicking methods in 0.4.23 have made chrono very unpleasant to use, with unwrap() all over the place.

Many edge conditions may simply never arise in real-world use. Making every method that might error in some obscure situation return Result makes the code that uses chrono hard to read. Why is this unwrap? Is it because of some condition that will never arise, or is it taking sketchy short-cuts?

Example: NaiveDate::succ may fail if the following date falls outside of the range of dates representable by a NaiveDate. But how much real-world code has to deal with dates around the year 262143? In this case making the default method panic on overflow is best.

See #815 (comment)

Allow use without panics

For every API that may panic, there should be an equivalent that is guaranteed to never panic (and otherwise that is a bug).

Example: NaiveDate::try_succ.

Errors should be actionable.

The prime example is LocalResult for handle errors caused by working with local times. A time may be ambiguous, it make not exist, or there may be some other error. Methods like single, earliest, latest and more are helpfulto deal correctly with these common issues.

Similarly I would want to know the best guess of a date considered inconsistend while parsing (the InconsistentDate error variant).

And maybe Overflow should be split into Underflow and Overflow, so library users could implement some saturating_* methods, but I am not sure that is worth it.

@pitdicker
Copy link
Collaborator Author

pitdicker commented May 6, 2023

Failure conditions in current/proposed API

Okay, this doesn't match the current API completely. And it certainly doesn't guarantee all error conditions I would expect are actually handled correctly.

NaiveDateTime

method error causes notes
new
from_timestamp OutOfRange, InvalidParameter
from_timestamp_millis OutOfRange
from_timestamp_micros OutOfRange
parse_from_string OutOfRange, DoesNotExist, InvalidParameter, InconsistentDate, ParsingError(_)
date
time
timestamp
timestamp_millis
timestamp_micros
timestamp_nanos OutOfRange panics
timestamp_subsec_millis
timestamp_subsec_micros
timestamp_subsec_nanos
checked_add_signed OutOfRange
checked_sub_signed OutOfRange
checked_add_months OutOfRange note: changes the day to fit
checked_sub_months OutOfRange note: changes the day to fit
checked_add_days OutOfRange
checked_sub_days OutOfRange
signed_duration_since
format_with_items always succeeds, can fail during formatting
format always succeeds, can fail during formatting
and_local_timezone → Localresult

NaiveDate

method error causes notes
from_ymd OutOfRange, DoesNotExist, InvalidParameter panics
try_from_ymd OutOfRange, DoesNotExist, InvalidParameter
from_yo OutOfRange, DoesNotExist, InvalidParameter panics
try_from_yo OutOfRange, DoesNotExist, InvalidParameter
from_isoywd OutOfRange, DoesNotExist, InvalidParameter panics
try_from_isoywd OutOfRange, DoesNotExist, InvalidParameter
from_num_days_from_ce OutOfRange panics
try_from_num_days_from_ce OutOfRange
from_weekday_of_month OutOfRange, DoesNotExist, InvalidParameter panics
try_from_weekday_of_month OutOfRange, DoesNotExist, InvalidParameter
parse_from_str OutOfRange, DoesNotExist, InvalidParameter, InconsistentDate, ParsingError(_)
checked_add_signed OutOfRange
checked_sub_signed OutOfRange
checked_add_months OutOfRange note: changes the day to fit
checked_sub_months OutOfRange note: changes the day to fit
checked_add_days OutOfRange
checked_sub_days OutOfRange
signed_duration_since
years_since InvalidParameter
and_time
and_hms InvalidParameter panics
try_and_hms InvalidParameter
and_hms_milli InvalidParameter panics
try_and_hms_milli InvalidParameter
and_hms_micro InvalidParameter panics
try_and_hms_micro InvalidParameter
and_hms_nano InvalidParameter panics
try_and_hms_nano InvalidParameter
succ OutOfRange panics
try_succ OutOfRange panics
pred OutOfRange panics
try_pred OutOfRange panics
format_with_items always succeeds, can fail during formatting
format always succeeds, can fail during formatting
iter_days
iter_weeks
week

NaiveTime

method error causes notes
from_hms InvalidParameter panics
try_from_hms InvalidParameter
from_hms_milli InvalidParameter panics
try_from_hms_milli InvalidParameter
from_hms_micro InvalidParameter panics
try_from_hms_micro InvalidParameter
from_hms_nano InvalidParameter panics
try_from_hms_nano InvalidParameter
from_num_seconds_from_midnight InvalidParameter panics
try_from_num_seconds_from_midnight InvalidParameter
parse_from_string InvalidParameter, ParsingError(_)
overflowing_add_signed
overflowing_sub_signed
signed_duration_since
format_with_items always succeeds, can fail during formatting
format always succeeds, can fail during formatting

DateTime

method error causes notes
from_utc TzLoadingError(_) panics
from_local OutOfRange, TzLoadingError(_) panics
date_naive OutOfRange panics
time
timestamp
timestamp_millis
timestamp_micros
timestamp_nanos OutOfRange panics
timestamp_subsec_millis
timestamp_subsec_micros
timestamp_subsec_nanos
offset
timezone TzLoadingError(_) panics
with_timezone TzLoadingError(_) panics
checked_add_signed OutOfRange
checked_sub_signed OutOfRange
checked_add_months OutOfRange, DoesNotExist note: changes the day to fit
checked_sub_months OutOfRange, DoesNotExist note: changes the day to fit
checked_add_days OutOfRange, DoesNotExist
checked_sub_days OutOfRange, DoesNotExist
signed_duration_since
naive_utc
naive_local OutOfRange panics
years_since InvalidParameter
parse_from_rfc2822 OutOfRange, DoesNotExist, InconsistentDate, ParsingError(_)
parse_from_rfc3339 OutOfRange, DoesNotExist, ParsingError(_)
parse_from_str OutOfRange, DoesNotExist, InvalidParameter, InconsistentDate, ParsingError(_)
to_rfc2822 OutOfRange year outside of supported range
to_rfc3339
to_rfc3339_opts
format_with_items always succeeds, can fail during formatting
format always succeeds, can fail during formatting

Datelike

method error causes notes
year
month
month0
day
day0
ordinal
ordinal0
weekday
iso_week
with_year OutOfRange, DoesNotExist
with_month InvalidParameter, DoesNotExist
with_month0 InvalidParameter, DoesNotExist
with_day InvalidParameter, DoesNotExist
with_day0 InvalidParameter, DoesNotExist
with_ordinal InvalidParameter, DoesNotExist
with_ordinal0 InvalidParameter, DoesNotExist
year_ce
num_days_from_ce

Timelike

method error causes notes
hour
minute
second
nanosecond
with_hour OutOfRange, InvalidParameter, DoesNotExist
with_minute OutOfRange, InvalidParameter, DoesNotExist
with_second OutOfRange, InvalidParameter, DoesNotExist
with_nanosecond OutOfRange, InvalidParameter, DoesNotExist
hour12
num_seconds_from_midnight

LocalResult

method error causes notes
unwrap panic
single forward ChronoError, map Ambiguous to InconsistentDate, InGap to DoesNotExist
earliest forward ChronoError, map InGap to DoesNotExist
latest forward ChronoError, map InGap to DoesNotExist
earliest_or_nearest forward ChronoError
latest_or_nearest forward ChronoError

@pitdicker
Copy link
Collaborator Author

pitdicker commented May 6, 2023

Based on the API above, I would say that methods that can only fail because of InvalidParameter and OutOfRange would be best to make panicking by default. For others returning Result should be the default.

Example 1

NaiveDate::from_ymd has the error causes OutOfRange, DoesNotExist, InvalidParameter.

It should return a Result by default, because knowing whether a particular date exists is non-trivial (example: is the year a leap-year?).

A seperate NaiveDate::from_ymd_panicking (bikeshed! this is way too ugly and not convenient) should be provided that panics on invalid input.

Example 2

DateTime::naive_local can only fail if the datetime is near the end of the range of NaiveDateTime and the offset pushes the timestamp beyond that range. This is not something for regular code to worry about, so it should panic by default.

DateTime::checked_naive_local can be added as a non-panicking variant.

Example 3

DateTime::checked_add_days should return LocalResult, because it is not uncommon for a local time to be ambiguous or to not exist. Methods such as LocalResult::latest can be used to handle the error.

This is better than returning Result<LocalResult<DateTime<Tz>>, ChronoError>. If that where choosen, DateTime<Utc>::checked_add_days would need two unwraps (or something better). Returning LocalResult<DateTime<Tz>>, with an error such as OoutOfRange included in the enum variant Error, allows using a single unwrap.

Example 4

Timelike::with_minute is part of a trait.

  • On NaiveTime and NaiveDateTime it can only fail because of OutOfRange and InvalidParameter.
  • On DateTime it can also fail because of DoesNotExist.

For DateTime it would help if the method would return LocalResult, because then it becomes actionable. But this would make the trait much less convenient to use for NaiveTime and NaiveDateTime. I would say it is best to return Result<T, ChronoError>.

It should probably document there is a workaround to handle the error for DateTime:

datetime.timezone().from_local(datetime.naive_local().with_minute(val).unwrap()).latest_or_nearest();

Maybe splitting the Timelike trait into two would be better, one to query values only (hour, minute, second, etc.), and one to set values. It is a bit of a footgun on the DateTime type. But the same can be said about the Datelike trait 😞

@Zomtir
Copy link
Contributor

Zomtir commented May 7, 2023

Pleasant to use

The deprecations of many potentially panicking methods in 0.4.23 have made chrono very unpleasant to use, with unwrap() all over the place.

Many edge conditions may simply never arise in real-world use. Making every method that might error in some obscure situation return Result makes the code that uses chrono hard to read. Why is this unwrap? Is it because of some condition that will never arise, or is it taking sketchy short-cuts?

Example: NaiveDate::succ may fail if the following date falls outside of the range of dates representable by a NaiveDate. But how much real-world code has to deal with dates around the year 262143? In this case making the default method panic on overflow is best.

I think that point strongly depends on how exactly you use(d) chrono in your own package. Maybe from_ymd is a bad example, but I hope it gets the point across:

Case 1) Always unwrap and risk panics.

fn unwrapAndPanic() -> () {
    let datetime = chrono::NaiveDate::from_ymd(...).unwrap().with_hms(...).unwrap()
}

Case 2) Manually map the error or handle it.

fn handleError() -> () {
    let date = match chrono::NaiveDate::from_ymd(...) {
        Err(OutOfRange) => panic!("Date is out of range"),
        Err(DoesNotExist) => { println!("Date does not exist"); return () },
        Err(InvalidParameter) => NaiveDate::now().unwrap(),
        Ok(date) => date,
    }
    let datetime = match date.with_hms(...) {
        Err(e) => panic!("{:?}", e);
        Ok(datetime) => datetime,
    }
}

Case 3: Implement From<chrono::ChronoError> and always use ? magic.

impl From<chrono::ChronoError> for crate::Error {
    fn from(_: chrono::ChronoError) -> Self {
        crate::Error::ChronoError
    }
}

fn autoMapError() -> Result<(),crate::Error> {
    let datetime = chrono::NaiveDate::from_ymd(...)?.with_hms(...)?;
    Ok(())
}

I'm in favor of the third approach as it is very pleasant to use (once you strictly stick to it) and does not panic even in "unlikely cases".

Functions that logically cannot panic, should be safe to unwrap within the function itself and shall not return Result at all.

@pitdicker
Copy link
Collaborator Author

@Zomtir We then agree on from_ymd . What do you think about for example NaiveTime::from_hms? It should only fail if the values come from unvalidated input (InvalidParameter). To me that seems like not a good enough reason to make the code more verbose for every user of chrono.

@pitdicker
Copy link
Collaborator Author

Parsing errors

Current error enum for parsing errors:

pub enum ParseErrorKind {
    /// Given field is out of permitted range.
    OutOfRange,

    /// There is no possible date and time value with given set of fields.
    ///
    /// This does not include the out-of-range conditions, which are trivially invalid.
    /// It includes the case that there are one or more fields that are inconsistent to each other.
    Impossible,

    /// Given set of fields is not enough to make a requested date and time value.
    ///
    /// Note that there *may* be a case that given fields constrain the possible values so much
    /// that there is a unique possible value. Chrono only tries to be correct for
    /// most useful sets of fields however, as such constraint solving can be expensive.
    NotEnough,

    /// The input string has some invalid character sequence for given formatting items.
    Invalid,

    /// The input string has been prematurely ended.
    TooShort,

    /// All formatting items have been read but there is a remaining input.
    TooLong,

    /// There was an error on the formatting string, or there were non-supported formating items.
    BadFormat,

    // TODO: Change this to `#[non_exhaustive]` (on the enum) with the next breaking release.
    #[doc(hidden)]
    __Nonexhaustive,
}

ParsingError(usize)

In my opinion it would be okay, or even better, if the variants OutOfRange, Invalid and TooShort would be merged into a single ParsingError(usize). All are used to indicate the same problem: the input string doesn't match the format string / specification.

  • OutOfRange if created for various cases, such as when a month is greater than 12. I think this more often an indication of a date format that doesn't match the format string than a truly malformed input (i.e. it might be an indication of mm/dd/yyyy instead of dd/mm/yyyy).
  • Invalid is given when a certain field can't be parsed.
  • TooShort when the input string ends while the format string has fields left.

It may be helpful to indicate the index of the parsing error in the string, although I realize that is not easily possible with the current implementation.

Is this losing important information compared to the current errors? Honestly OutOfRange and Invalid don't confer all that much...

I can image are rare case were the current TooShort could have a benefit: parsing a string that arrives in chunks. "Parsing failed, but it may succeed if you append the next chunck." But with ParsingError and a usize with the index of the parsing error it is also possible to figure that out.

Intermediate result instead of TooLong

One of the things the API guidelines mention, under C-INTERMEDIATE, is to expose intermediate results. In my opinion, when there is trailing data after the succesful parsing of a string, it would be helpful to return both the parsed result and the remainder (see #1011).

Would a seperate TooLong error variant be useful? An API to return intermediate results would be more convenient for library users that want to know handle this case. For others that just want to report an error returning ParsingError should be good enough (?).

Succesful parsing, but invalid end result

Even after the fields of a string are succesfully parsed, it may not result in a valid date:

  1. Some fields of a date string may not agree with each other, like weekday and date.
  2. A date may not exist.
  3. A time may not exist in a particular timezone.
  4. The result may be out of range.

I thought that it would be helpful to expose some helpful intermediate information with the error. But then this is better solved with an alternative API. format::{Parsed, parse} and StrftimeItems fit the bill (although the API could use some love):

let date_str = "Sun 2023-04-21"; // Should be Fri
let mut parsed_fields = Parsed::new();
parse(&mut parsed_fields, date_str, StrftimeItems::new("%a %Y-%m-%d"))?;
parsed.weekday = None; // ignore the weekday
assert_eq!(parsed.to_naive_date().ok(), NaiveDate::from_ymd_opt(2023, 4, 21));

New iteration on ChronoError

pub enum ChronoError {
    InvalidParameter, // example: hour == 25
    OutOfRange, // example: year = 500_000
    DoesNotExist, // example: 31 February
    Inconsistent, // example: parsing Sunday 2023-04-21 (should be Friday)
    ParsingError(usize), // with byte index of parsing failure
    // Maybe:
    TzLoadingError(/* todo */), // Catchall for all errors caused by the OS in `offset::local`
}

@Zomtir
Copy link
Contributor

Zomtir commented May 9, 2023

@pitdicker I have no strong preference either way. I don't mind the additional effort to unwrap any Result, especially if it is only one additional ?. On the other hand sufficient quality of life for the broad user base might outweigh a rare panic.

I really like your thoughts about the ParsingError. I took a tour around that issue and kept the existing logic without much afterthought. Unfortunately I think enums cannot take additional parameters like ParsingError(usize) that easily. Ultimately you have to decide how to compare ParsingError(6) with ParsingError(4). Are they supposed to be equal or not?

@pitdicker
Copy link
Collaborator Author

pitdicker commented May 9, 2023

My knowledge of error handling in Rust is a couple of years out of date. But as I understand it error chaining remains unavailable without the standard library or alloc.

Timezone-related errors

There is currently an error type with 16 variants in offset::local::tz_info (two unused ones are commented out):

pub(crate) enum Error {
    /// Date time error
    // DateTime(&'static str),
    /// Local time type search error
    FindLocalTimeType(&'static str),
    /// Local time type error
    LocalTimeType(&'static str),
    /// Invalid slice for integer conversion
    InvalidSlice(&'static str),
    /// Invalid Tzif file
    InvalidTzFile(&'static str),
    /// Invalid TZ string
    InvalidTzString(&'static str),
    /// I/O error
    Io(io::Error),
    /// Out of range error
    OutOfRange(&'static str),
    /// Integer parsing error
    ParseInt(ParseIntError),
    /// Date time projection error
    ProjectDateTime(&'static str),
    /// System time error
    //SystemTime(SystemTimeError),
    /// Time zone error
    TimeZone(&'static str),
    /// Transition rule error
    TransitionRule(&'static str),
    /// Unsupported Tzif file
    UnsupportedTzFile(&'static str),
    /// Unsupported TZ string
    UnsupportedTzString(&'static str),
    /// UTF-8 error
    Utf8(Utf8Error),
}

The first thing to note is that it wraps whatever error may arise. This is appropriate for application code, but not usual for a library.

And it is currently never exposed 😄.

Maybe a method like Local::initialize can be added to exposes these error causes instead of panicking on first use of Local. @esheppa has much more developed ideas around this than I do...


How much should be reported?

On Unix it is possible to override the local timezone with the TZ environment variable. This string can either be a timezone name, or a rule with offsets and transitions. The named timezone may not exist, or fail to load, or the rule may have some error. It makes sense to ignore the override when invalid. Should there still be some error like InvalidTimeZoneOverride?

In theory the OS-provided timezone information can be wrong. It may have offsets that are non-sensical. An API call may fail. On Unix the timezone database may be missing. Or a file in that database may be corrupt.

If a tzinfo file is corrupt, I would say you don't need to know what is corrupt about it. And the current errors InvalidSlice, ParseInt, TransitionRule etc. are already not helpful enough to debug it. Better to leave that to whoever is responsible for maintaining that file. An error like InvalidTzFile should be enough.

For OS error codes we may want to take a page out of the book of getrandom::Error, and store the raw integer. This allows the error type to remain small and no_std compatible, while still providing one useful step in the error chain.

New iteration on ChronoError

pub enum ChronoError {
    InvalidParameter, // example: hour == 25
    OutOfRange, // example: year = 500_000
    DoesNotExist, // example: 31 February
    Inconsistent, // example: parsing Sunday 2023-04-21 (should be Friday)
    ParsingError(usize), // with byte index of parsing failure
    InvalidTzOverride, // example: invalid TZ environment variable
    InvalidTzFile,
    OsError(i32),
}

@pitdicker
Copy link
Collaborator Author

I took a tour around that issue and kept the existing logic without much afterthought.

That was very reasonable to explore were and how the API should change. And now we can look from the other side: what errors should we expect? But I am only looking at the documentation, and haven't tried to touch the code yet. You have, so you opinion is useful.

Unfortunately I think enums cannot take additional parameters like ParsingError(usize) that easily. Ultimately you have to decide how to compare ParsingError(6) with ParsingError(4). Are they supposed to be equal or not?

If the result is not ignored or passed on, pattern matching would be easiest: if let Err(ChronoError::ParsingError(_)) = result).

@pitdicker
Copy link
Collaborator Author

pitdicker commented May 9, 2023

Making the error implementation private?

One option is to wrap the error enum inside a newtype pub struct ChronoError(ChronoErrorKind), and use a method kind() to get to the enum. This allows chrono to in the future switch to a different way to store the errors (say, a u64) without breaking backwards compatability.

In my opinion, with an error enum a simple as the one above, this would only add unnecessary boilerplate.
Or is there a benefit I am missing?

Optimize error type for size

Parts of chrono have been written with performance in mind. It would be unfortunate if the addition of a complex error type would regress this. Benchmarks would have to tell, but I don't think this is really a risk if the error type stays small; similar or smaller in size than types like NaiveDateTime.

We could shrink the size of the error enum a little: ParsingError realistically doesn't need to hold more than an u32. An input string will always be similar in length to a format string. There is only a limited number of possible fields, and none can appear more than once without resulting in a ParseError. And with a format string longer then ~2 gigabyte we can reasonably return InvalidParameter.

This would put the size of the error enum at just 8 bytes.

New iteration on ChronoError

#[non_exhaustive]
pub enum ChronoError {
    InvalidParameter, // example: hour == 25
    OutOfRange, // example: year = 500_000
    DoesNotExist, // example: 31 February
    Inconsistent, // example: parsing Sunday 2023-04-21 (should be Friday)
    ParsingError(u32), // with byte index of parsing failure
    InvalidTzOverride, // example: invalid TZ environment variable
    InvalidTzFile,
    OsError(i32),
}

@jtmoon79
Copy link
Contributor

jtmoon79 commented May 21, 2023

There is currently an error type with 16 variants in offset::local::tz_info (two unused ones are commented out):

pub(crate) enum Error {
    /// Date time error
    // DateTime(&'static str),
    /// Local time type search error
    FindLocalTimeType(&'static str),
    /// Local time type error
    LocalTimeType(&'static str),
    /// Invalid slice for integer conversion
    InvalidSlice(&'static str),
    /// Invalid Tzif file
    InvalidTzFile(&'static str),
    /// Invalid TZ string
    InvalidTzString(&'static str),
    /// I/O error
    Io(io::Error),
    /// Out of range error
    OutOfRange(&'static str),
    /// Integer parsing error
    ParseInt(ParseIntError),
    /// Date time projection error
    ProjectDateTime(&'static str),
    /// System time error
    //SystemTime(SystemTimeError),
    /// Time zone error
    TimeZone(&'static str),
    /// Transition rule error
    TransitionRule(&'static str),
    /// Unsupported Tzif file
    UnsupportedTzFile(&'static str),
    /// Unsupported TZ string
    UnsupportedTzString(&'static str),
    /// UTF-8 error
    Utf8(Utf8Error),
}

The first thing to note is that it wraps whatever error may arise. This is appropriate for application code, but not usual for a library.

And it is currently never exposed 😄.

If this is internal then yes, I recommend changing it. Specifically, Io(io::Error) will not be able to support traits Copy and Clone due to rust-lang/rust#24135 . This makes shuffling Error instances around much more difficult (passing them, storing them, sending them through thread channels, etc.)

@jtmoon79
Copy link
Contributor

jtmoon79 commented May 21, 2023

New iteration on ChronoError

#[non_exhaustive]
pub enum ChronoError {
    InvalidParameter, // example: hour == 25
    OutOfRange, // example: year = 500_000
    DoesNotExist, // example: 31 February
    Inconsistent, // example: parsing Sunday 2023-04-21 (should be Friday)
    ParsingError(u32), // with byte index of parsing failure
    InvalidTzOverride, // example: invalid TZ environment variable
    InvalidTzFile,
    OsError(i32),
}

What traits will this enum support? In the least, it should support (Clone, Copy, Debug, Eq, Ord, PartialEq, PartialOrd). This the minimal set of traits for making enums not overly tedious to use.

@ahcodedthat
Copy link

Note that the conversions to and from std::time types can also panic.

These are especially insidious. Consider this:

fn example(dt: impl Into<DateTime<Utc>>) {
    let dt = dt.into();
    // …do stuff with `dt`…
}

It's not immediately obvious that this function can panic, but it can, because DateTime<Utc> implements From<SystemTime> in a way that panics on overflow. If the SystemTime comes from an untrusted source (e.g. the last-modified time stamp of a file owned by a different user), then this can be a denial-of-service vulnerability.

The reverse is also true: the earliest date-time representable by DateTime is earlier than that of SystemTime on Windows, so DateTimeSystemTime can panic too.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 6, 2024

The initial PR to add an error type is up #1049!

Parsing errors

I've started working on converting the parsing code to the new Error type. I now think we should add/keep an internal ParseError type.

Overview of parsing

  • Public entry point (currently format::parse()).
  • Private function that drives the parsing loop and various helper functions for parsing things like numbers, month names, offset. They all have access to a slice with the remainder of the input.
  • All values are stored in Parsed using public setter methods.
  • After parsing Parsed converts the value to the target type.

Errors causes

Consider the string 18:45 (6:45 PM.

During parsing we can have four causes for errors:

  1. A character does not match with the expected format.
  2. The input is shorter than expected by the format (the input is missing a final )).
  3. A value in the input is larger than allowed by the format (like 50 for an hour value).
  4. A value is set twice, but to different numbers (the example has two minute fields that must be the same).

After parsing we have another set of error causes:

  1. Some of the fields are not consistent with each other (for example date fields that don't match with the weekday field).
  2. The input may not have enough fields to create the target type. The plan in Why does chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H") pass? #1075 and Allow default values when parsing, and error on unused fields #1094 was to supply default values, but not in all cases. For example if the input has hours and seconds, we error if minutes are missing. This is really a problem with the format string, not with the input.
  3. The input contains fields that are discarded by the target type. Curently allowed, but the plan is to return an error.
  4. The final value exceeds the range supported by the target type.
  5. There is input remaining after parsing. The rust API guidelines suggest it is best to not consider this an error but to return the remainder.

The error causes after parsing can map to Error::Inconsistent (1), Error::FormatNotEnough (2, to be added), Error::InvalidParameter (3), Error::OutOfRange (4).

Internal error type

I propose the internal methods used during parsing should use their own error type. Then the errors can contain the remaining slice and have a lifetime that depends on the input. At the conversion to our public error type it can be turned into a byte index of the source slice.

enum ParseError<'a> {
    /// Character does not match with the expected format.
    ///
    /// Contains a reference to the remaining slice, starting with the mismatched character.
    ///
    /// Maps to `Error::InvalidCharacter(index)`.
    InvalidCharacter(&'a str),

    /// The input is shorter than expected by the format.
    ///
    /// Maps to `Error::InvalidCharacter(input.len())`.
    TooShort,

    /// Value is not allowed by the format.
    ///
    /// Examples are a number that is larger or smaller than the defined range, or the name of a
    /// weekday, month or timezone that doesn't match.
    ///
    /// Contains a reference to the remaining slice, starting with invalid value.
    ///
    /// Maps to `Error::InvalidValue(index)`.
    InvalidValue(&'a str),

    /// Some of the date or time components are not consistent with each other.
    ///
    /// During parsing this only arises if a field is set for a second time in the `Parsed` type,
    /// while not matching the first value.
    ///
    /// Maps to `Error::Inconsistent`.
    Inconsistent,
}

New iteration on Error

#[non_exhaustive]
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub enum Error {
    /// One or more of the arguments to a function are invalid.
    ///
    /// An example is creating a `NaiveTime` with 25 as the hour value.
    InvalidParameter,

    /// The result, or an intermediate value necessary for calculating a result, would be out of
    /// range.
    ///
    /// An example is a date for the year 500.000, which is out of the range supported by chrono's
    /// types.
    OutOfRange,

    /// A date or datetime does not exist.
    ///
    /// Examples are:
    /// - April 31,
    /// - February 29 in a non-leap year,
    /// - a time that falls in the gap created by moving the clock forward during a DST transition,
    /// - a leap second on a non-minute boundary.
    DoesNotExist,

    /// Some of the date or time components are not consistent with each other.
    ///
    /// An example is parsing 'Sunday 2023-04-21', while that date is a Friday.
    Inconsistent,

    /// There is not enough information to create a date/time.
    ///
    /// An example is parsing a string with not enough date/time fields, or the result of a
    /// time that is ambiguous during a time zone transitions (due to for example DST).
    Ambiguous,

    /// Character does not match with the expected format.
    ///
    /// Contains the byte index of the character where the input diverges.
    InvalidCharacter(u32),

    /// Value is not allowed by the format (during parsing).
    ///
    /// Examples are a number that is larger or smaller than the defined range, or the name of a
    /// weekday, month or timezone that doesn't match.
    ///
    /// Contains the byte index pointing at the start of the invalid value.
    InvalidValue(u32),

    /// The format string contains a formatting specifier that is not supported.
    ///
    /// Contains the byte index of the formatting specifier within the format string.
    UnsupportedSpecifier(u32),

    // TODO: Error sources that are not yet covered are the platform APIs, the parsing of a `TZfile`
    // and parsing of a `TZ` environment variable.
}

@pitdicker
Copy link
Collaborator Author

It's not immediately obvious that this function can panic, but it can, because DateTime<Utc> implements From<SystemTime> in a way that panics on overflow. If the SystemTime comes from an untrusted source (e.g. the last-modified time stamp of a file owned by a different user), then this can be a denial-of-service vulnerability.

@ahcodedthat We have no documentation yet for From<SystemTime> and don't give a good panic message. Or even better: a TryFrom implementation. Do you want to open a different issue for that? Or maybe even make a small PR?

@ahcodedthat
Copy link

@pitdicker Ok, I have opened #1421.

ahcodedthat added a commit to ahcodedthat/chrono that referenced this issue Feb 12, 2024
ahcodedthat added a commit to ahcodedthat/chrono that referenced this issue Feb 13, 2024
@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 19, 2024

It seems good to write down some of the steps I am planning.

TimeDelta type

  • A couple of uses of TimeDelta constructors do range checks because they pre-date the fallible constructors. We can now make them more idiomatic.
  • Replace all uses of panicking TimeDelta constructors with their try_ variant.
  • Deprecate the panicking variants.
  • Merge these changes from 0.4.x to 0.5.x.
  • Remove deprecated TimeDelta methods on 0.5.x.
  • Rename try_ methods.
  • Convert to Results.

Parsing module

  • Revert parser whitespace changes (Consider reverting #807 on the 0.5 branch #1426).
  • Improve documentation for Parsed.
  • Move range checks to Parsed::set_*
  • (optional) Refactor less sane parts of Parsed, deprecate Parsed::to_naive_datetime_with_offset.
  • Merge 0.4.x to 0.5.x branch.
  • Make Parsed fields private, remove some double range checks
  • Convert Parsed to new error type.
  • Convert parse() and its callers to new error type.
  • Convert parsing methods to new ParseError.

Local type and platform integration

No concrete plan yet. I'm collecting some thoughts in a new issue.

@pitdicker
Copy link
Collaborator Author

This issue seems to have mostly served its purpose. There is still some iteration on the details, so the end result is going to be different from what is discussed here. #1469 tracks the progress of individual methods, and #1448 is split out for errors originating from methods on the TimeZone trait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-incompatible Tracking changes that need incompatible API revisions
Projects
None yet
Development

No branches or pull requests

4 participants