Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PlainDate.from('2020-01-01T00:00Z') is problematic #1751

Closed
justingrant opened this issue Aug 20, 2021 · 17 comments · Fixed by #1874
Closed

PlainDate.from('2020-01-01T00:00Z') is problematic #1751

justingrant opened this issue Aug 20, 2021 · 17 comments · Fixed by #1874

Comments

@justingrant
Copy link
Collaborator

justingrant commented Aug 20, 2021

Should PlainDate.from('2020-01-01T00:00Z') throw? This question came up while @ptomato was building #1749, which was based on feedback from IETF that "...Z[IANA]" strings should be valid. This issue poses the reverse question: should "Z" be prohibited for types like PlainDate?

If "Z" means "an unmoored instant that does not represent local time" then throwing in that case might make sense because the Z means that none of the date/time units are meaningful for any Plain type.

A programmer who tries to parse a Z string into a PlainDate is almost certainly committing an error. A typical example is log parsing:

getLogDateForUI(logLine) {
  // Example line: 2021-01-21T23:08:34.456 Database crash
  const timestamp = logLine.split(' ')[0];
  return Temporal.PlainDate.from(timestamp); // BUG! (if this date is assumed to match local date)
  
  // Here's what *should* be used for this case:
  return Temporal.Instant.from(timestamp).toZonedDateTimeISO(myTimeZone).toPlainDate();  
}

Another case where this could bite the developer would be parsing data that was originally in Wall+Offset format, but later changes to Instant+Z format. A naive developer might look at the original data and assume it's safe to parse into a PlainDate, but when the data format changes their app is broken but no error is shown.

Another side effect of the current behavior is that using a Z string for a relativeTo will use the UTC units, which is unlikely to match the developer's intent.

In the unusual case where the developer really wants to measure the date/time units in a Z string, there's an easy opt-in path that expresses a really clear programmer intent for all later readers of the code:

// Use the special time zone "UTC" whose local time matches Z timestamps
Instant.from('2020-01-01T00:00Z').toZonedDateTimeISO('UTC').hour
@justingrant justingrant changed the title PlainDate.from('2020-01-01T00:00Z') is very problematic PlainDate.from('2020-01-01T00:00Z') is problematic Aug 20, 2021
@ptomato
Copy link
Collaborator

ptomato commented Aug 24, 2021

That seems out of step with the decision to allow ignoring irrelevant parts of an ISO string as long as they are valid according to ISO 8601 semantics (see https://tc39.es/proposal-temporal/docs/parse-draft.html). I think throwing here would make it harder to accommodate the use case that we would otherwise have needed Temporal.parse() for. So, I don't think we should change this.

@justingrant
Copy link
Collaborator Author

justingrant commented Aug 24, 2021

I think throwing here would make it harder to accommodate the use case that we would otherwise have needed Temporal.parse() for.

What use case(s) did you have in mind for parsing an instant Z string into a PlainDate or (EDIT: or PlainMonthDay or PlainYearMonth) ? After doing a lot of investigation and prototyping in #1770, I'm now pretty curious about this.

@ptomato
Copy link
Collaborator

ptomato commented Aug 25, 2021

I was thinking about the use case referred to as "parsing parts" in https://tc39.es/proposal-temporal/docs/parse-draft.html. But then I remembered that use case is already no longer accommodated within Temporal because we throw even if an ignored part of the string is invalid.

I do still think that we should stick to the decision to allow ignoring irrelevant parts of an ISO string as long as they are valid. I think we went back and forth on this a lot, and while my personal preference would actually be to support "parsing parts", I think the tradeoff we made during Stage 2 was reasonable and I don't think Stage 3 is a good time to revisit that unless we have significant new information.

@justingrant
Copy link
Collaborator Author

ignoring irrelevant parts of an ISO string as long as they are valid

I think we agree on this. But we probably disagree on he meaning of "parts" and "valid". A string like
2020-12-01T00:00Z is an atomic representation of a point on the exact timeline. It has no "parts" for the same reason that Instant has no year getter. It's interchangeable with 1606780800000000000n nanoseconds or 1606780800 seconds. But it's not the same as with Temporal.PlainMonthDay.from({month: 12, day: 1}).toPlainDate({year: 2020}).toPlainDateTime() for the same reason that 0b1010 isn't interchangeable with Number("1010").toString(2) Even if their text encodings look similar, their meanings are different and shouldn't be parsed as if they're the same.

Related: default string representations of each Temporal type are supposed to be interchangeable with Temporal class instances. Developers should be able to replace an instance with the results of toString(), and vice versa, and Temporal should act the same. AFAICT, PlainXxx.from(instant.toString()) is the only place this invariant doesn't hold.

re: "valid", each Temporal type decides which subset of the ISO syntax is valid for that type. Temporal.ZonedDateTime.from(instant.toString()) throws because ZDT requires something that Instant lacks. PlainXxx types similarly require something that Instant lacks: calendar-relative date and time units, which Z strings lack for the same reason that Instant lacks year, hour, calendar, etc. Why is the string case different?

Another reason we omitted date/time getters on Instant was that interpreting UTC dates as local dates was a common problem when using legacy Date and other libraries. We wanted to avoid these errors in Temporal. Parsing a Z string with PlainDate.from(instant.toString()).month is equivalent to instant.month which we've already decided to disallow. Why is the string case different?

One reason to leave as-is would be if throwing caused a high hurdle for developers who do want to pull units out of a Z string. But the workaround is trivial: Instant.from(s).toZonedDateTime(Temporal.TimeZone.from(s)) or Instant.from(s).toZonedDateTime('UTC'). Given that "get date/time units from a Z string" is usually a developer bug, requiring extra code to opt in seems like a feature not a bug. Extra code can also flag to readers of that code that the code is doing something unusual.

If there's big benefits on the other side of the equation I'd totally be open to leaving things as-is. But I'm struggling to understand the advantage of the current state. It seems like a bug that's inconsistent with the rest of Temporal.

@ptomato
Copy link
Collaborator

ptomato commented Aug 26, 2021

Thanks for the summary. I think this is the main point where we disagree:

A string like 2020-12-01T00:00Z is an atomic representation of a point on the exact timeline.

My opinion: that string can mean that, and it certainly means that if it originates from Temporal, but it can also just be data from an external source, where it represents whatever the person processing the data needs it to represent. This is also the reason why we have Temporal.PlainDate.from('2020-12-01T11:52:35') discard the time portion of the string, even though it would clearly mean something more specific (a calendar date-time, not a calendar date) if it originated from Temporal and in that case the caller would be losing information. But if you have string data coming from an external source, you can choose which information to extract from it, and keeping this use case convenient is the consistency within Temporal that I personally think weighs more heavily.

@justingrant
Copy link
Collaborator Author

This is also the reason why we have Temporal.PlainDate.from('2020-12-01T11:52:35') discard the time portion of the string, even though it would clearly mean something more specific (a calendar date-time, not a calendar date)

The difference is that when decomposing 2020-12-01T21:52:35, the pieces always mean the same thing as when they were a part of the original string, because 9:52 PM on Dec 20, 2020 is still a time that's on Dec 20, 2020. Therefore, Temporal.PlainDate.from('2020-12-01T21:52:35') always means the same as Temporal.PlainDate.from('2020-12-01'). It's never a programmer bug to use the pieces separately.

Similarly, 2020-12-01T21:52:35-08:00 is also safely decomposable. The pieces always mean the same thing when they're taken apart as they did when they were part of the original string. It's not a programmer bug to use the pieces separately.

That's not true of Temporal.PlainDate.from('2020-12-01T21:52:35Z'), where using the pieces separately is very likely to be a programmer bug because 2020-12-01 has a different meaning from 2020-12-01Z aka 2020-12-01T00:00Z. The former is a calendar date, the latter is an exact timestamp that is actually November 30 in my time zone, not December 1.

The difference between wall and exact time is the most fundamental concept in Temporal. Our type structure is based on this distinction. Our API is designed to make it impossible to convert between wall<=>exact without a time zone. The case of parsing Z strings with PlainXxx.from is AFAICT the only case in Temporal where we allow calendar/wall time and exact time to be converted without a time zone. Why is this case special enough to break the pattern we worked so hard to establish elsewhere?

keeping this use case convenient is the consistency within Temporal that I personally think weighs more heavily.

Do you think that this is a common use case where ergonomics is really important? In my experience (I built log-analysis software at Splunk and Microsoft, so I've seen a lot of UTC-string parsing code over the years) extracting date parts from UTC-denominated data is relatively unusual. It's not rare, but in my experience parsing a Z string into a date is much more likely to be a bug than intentional use.

Given that the opt-in workaround (.toZonedDateTime('UTC')) is easy and that the workaround makes it much easier for future readers of the code to know what's going on, I don't understand why a relatively small ergonomics gain for an unusual use case is worth both bugs and muddying the wall vs. exact distinction Temporal tries so hard to keep distinct.

Am I missing some other common use case that justifies prioritizing ergonomics here?

@ptomato
Copy link
Collaborator

ptomato commented Aug 30, 2021

I think this case weighs more heavily because while Temporal has the wall time vs. exact time distinction, ISO strings do not. ISO strings have no meaning, they are either valid or invalid according to the syntactic rules of the format. Strong typing can be encouraged for strings within the context of Temporal, but IMO it can't be enforced ("stringly typed") like it can for the strongly typed Temporal objects.

After reflecting over the weekend about why I feel so strongly about something minor — I realized the above isn't actually my main concern, as it's only a medium-mildly held opinion. Rather, I am concerned that this would be a semantic change originating from within the champions group. We agreed that revisiting our decisions without external motivating feedback would be out of scope for Stage 3, and have held other champions to this agreement already. This is actually why I'm strongly against revisiting this.

@justingrant
Copy link
Collaborator Author

this would be a semantic change originating from within the champions group. We agreed that revisiting our decisions without external motivating feedback would be out of scope for Stage 3

I agree with the principle of not revisiting champions decisions. But we also fix bugs when we find them, which I assume includes not just things we decided on but implemented differently, but also things we didn't actually decide or discuss in the first place.

Any idea when we discussed and decided on the current behavior to allow PlainDate.from('2020-01-01T00:00Z')? It wouldn't be the first time I forget a discussion, but I don't remember discussing and allowing this.

@justingrant
Copy link
Collaborator Author

(BTW, this isn't urgent... fine to discuss next week!)

@pipobscure
Copy link
Collaborator

pipobscure commented Sep 16, 2021

  • Claiming that 2020-12-01T00:00Z is an atomic representation of an exact point in time is correct
  • Claiming that 2020-12-01T00:00Z does not have a plain-date and plain-time part is incorrect
  • All the PlainXXX types have partial information and can be combined to parse parts out of a whole valid ISO string

So I see

  • PlainDate.from('2020-01-01T00:00Z')
  • PlainTime.from('2020-01-01T00:00Z')
  • PlainDateTime.from('2020-01-01T00:00Z')
  • TimeZone.from('2020-01-01T00:00Z')
  • TimeZone.from('2020-01-01T00:00+01:00[Europe/Vienna]')

all as totally valid. The Z should not be considered particularly special. It’s simply a synonym for UTC that exists mainly because it’s the universal standard rather than something special like no timezone.

Now as to throwing for invalid parts, it’s important to remember a convo we had where we were discussing: Syntactic Validity vs. Semantic Validity

We should throw on syntactic invalidity but never on semantic invalidity. And this is the very thing that makes the parsing parts use case still very valid!

I’m happy to discuss further next meeting and apologise again for having to miss today.

@justingrant
Copy link
Collaborator Author

I tried to summarize this thread below. I probably missed some things, esp. on the "don't throw" side because I'm less familiar with that one. LMK what I missed and I'll fix! Let's try to discuss either in this week's meeting or the next one.

PlainDate.from('2020-01-01T00:00Z') is OK because...

  • Consistency: "We should throw on syntactic invalidity but never on semantic invalidity."
  • Makes it more ergonomic to "parse into parts" and later reassemble, or other advanced parsing use cases.
  • It's currently spec-ed behavior, this is a normative change, and we're in Stage 3.
  • No one externally is asking to change this; changes in Stage 3 shouldn't come from the champions.

PlainDate.from('2020-01-01T00:00Z') should throw because...

  • Consistency: All (other) Temporal APIs to convert exact->calendar data require a time zone.
  • Consistency: All (other) Temporal APIs treat Temporal instances and strings interchangeably. Currently, PlainDate.from throws for an Instant instance but not for its default string representation.
  • Quickly teaches developers that PlainDate.from is the wrong API to get a calendar date from an exact timestamp.
  • If an external data source switches from DateTime+Offset format to Z format, buggy code won't fail silently.
  • If we decide this is a bug, we generally fix bugs in Stage 3 regardless of who raised the issue.

It's interesting that consistency depends on what part of Temporal you're looking at. If looking at string parsing only, then "no throw" seems more consistent. If looking at Temporal overall (especially the strict distinction between exact and calendar/wall time) then "throw" seems more consistent. Is one of these consistencies more more important?

BTW, if we do decide to throw, then advanced parsing use-cases like "extract date fields from an exact-timestamp string" would use code like this:

Instant.from(s).toZonedDateTimeISO(s).toPlainDate()

This pattern may be familiar because it's also used for non-advanced use-cases like "get a date in the user's time zone".

Instant.from(s).toZonedDateTimeISO(tz).toPlainDate()

@ptomato
Copy link
Collaborator

ptomato commented Sep 30, 2021

The prior discussion that I had been thinking of is #428 and the document that came out of it https://github.com/tc39/proposal-temporal/pull/503/files?short_path=44e33f4#diff-44e33f44c03ab39e7308b009e6b39d2a81369bb9ffb3924a2a4e7a9775cee62c

I've put this on the agenda for October 14.

My current opinion is that I don't find most of the reasons for throwing very convincing:

  • Consistency: All (other) Temporal APIs to convert exact->calendar data require a time zone.

This is true, but a string is a string, not an exact time; we don't have 'stringly-typed' APIs.

  • Consistency: All (other) Temporal APIs treat Temporal instances and strings interchangeably. Currently, PlainDate.from throws for an Instant instance but not for its default string representation.

This one I am slightly sympathetic to, but I'm also skeptical that Temporal.PlainDate.from(instant.toString()) is the first thing that developers will reach for when Temporal.PlainDate.from(instant) doesn't work.

  • If we decide this is a bug, we generally fix bugs in Stage 3 regardless of who raised the issue.

I don't agree that this is a bug, though, even I if did agree that we should change it. We currently have semantics for from() that are self-consistent. This discussion is about changing the semantics to something else that is also self-consistent. This is the main reason why I feel we shouldn't make this change in Stage 3.

@justingrant
Copy link
Collaborator Author

Thanks @ptomato for digging up the history! As I guessed, it was a little before my time as a champion. :-( Had I known that this was the behavior, I would have raised this issue much sooner... I'm so sorry!

  • Consistency: All (other) Temporal APIs treat Temporal instances and strings interchangeably. Currently, PlainDate.from throws for an Instant instance but not for its default string representation.

This one I am slightly sympathetic to, but I'm also skeptical that Temporal.PlainDate.from(instant.toString()) is the first thing that developers will reach for when Temporal.PlainDate.from(instant) doesn't work.

I agree. I don't think that "reach for when Temporal.PlainDate.from(instant) doesn't work" will be common.

I think the more common case is code that depends on PlainDate.from to reject invalidi date inputs, and relies on the exception path to do something useful like apply a default date or show a validation message. For example, imagine a date picker UI component:

function setDate(input) {
  try {
    this.value = PlainDate.from(input).toString();
    this.validationError = null;
  } catch(e) {
    this.value = null;
    this.validationError = 'Invalid Date'; 
  }
}

This code always works when instances are passed. And it always works for strings too... except Z strings. This is going to be a really hard corner case for developers to know about and to remember to defend against. Here's one attempt at writing this properly given the current spec. Is there a better way?

function setDate(input) {
  let value = null;
  const s = String(input);
  if (!s.toUpperCase.endsWith('Z')) {
    try {
      value = PlainDate.from(s).toString();
    } catch(e) { }
  }
  this.validationError = value ? null : 'Invalid Date'; 
  this.value = value;
}

Functions that accept date-only values are very common! This code seems really complicated and obscure for such a popular use case.

Another variant of this problem is where callers refactor to start using strings instead of instances. This is a common refactor because strings are easier for serialization, interop, React state, map keys, DB storage, etc.

Suddenly code that loudly failed before is now silently broken with users in some time zones, at some times of day, getting off-by-one dates instead of "Invalid Date".

These off-by-one day, UTC vs. local bugs happen frequently. I've seen them in every company I've worked at in the last 10 years. I even saw it last week in an app for making COVID testing appointments!

Because this bug it's so common, I expect that most usage of PlainDate.from with Z strings will be buggy usage. Real use cases for parsing the "date" of a Z string seem relatively rare. Or am I missing common use cases where that's needed?

The alternate way to get the "date" portion of a Z string (or any Instant string) is, IMHO, better code for that use case:

parseDateFromInstantString = (s) => Temporal.Instant.from(s).toZonedDateTime(s).toPlainDate()

This code is much more clear about the unusual thing it's doing, which will therefore make it easier for later readers to know what's going on. Why would we not want to encourage (via an exception in PlainDate.from) developers to use this pattern?

I guess I don't understand why we'd leave the ecosystem vulnerable to this bug (and the annoying code to work around it) when we don't have to, and why we'd want developers to use less-clear code for an unusual use case.

Totally agree we should have had this discussion sooner. I'm so sorry. 😞 I didn't even know about this behavior until I filed this issue, or I would have brought it up much sooner.

@ptomato
Copy link
Collaborator

ptomato commented Oct 14, 2021

Thanks to the above comment I finally understand why this can be dangerous. Off-by-one-day bugs that only happen in some time zones are really bad and IMO worth breaking some consistency if it makes it harder to commit them. I still disagree with Justin on the meaning of Z as a no-time-zone marker, as I feel that that goes too far in the direction of a stringly-typed API, but I think the severity of the potential for user-affecting bugs here makes it worth making an exception to catch when programmers are trying to do this.

My personal opinion aside, we discussed it in the champions meeting of October 14 and came to the conclusion that both sides of the argument have merit and a value judgement is needed. Is it more important to prevent users from being affected by bugs committed by programmers who are misusing strings in this way, or is it more important to be consistent with the definition of Z in ISO 8601 and avoid attaching semantics to ISO strings? We'll prepare this change for the October plenary if it can be done before the deadline, and otherwise for the December plenary, and have that discussion.

@ljharb
Copy link
Member

ljharb commented Oct 14, 2021

Preventing bugs seems much more important than ideological purity to me.

ptomato added a commit that referenced this issue Oct 15, 2021
Accepting this kind of string in Plain types poses a risk when
deserializing from databases, as some attach local-time-zone semantics to
this kind of string. This could cause user-facing off-by-one-day bugs.

Closes: #1751
@justingrant
Copy link
Collaborator Author

Here's an interesting tidbit that backs up @ptomato's concern around SQL databases, in this case from the Postgres docs:

The SQL standard requires that writing just timestamp be equivalent to timestamp without time zone, and PostgreSQL honors that behavior. (Releases prior to 7.3 treated it as timestamp with time zone.)

If I'm reading this correctly (I'm not a Postgres expert so YMMV) writing the exact same SQL on different versions of Postgres could emit a different format of ISO string: DateTime+offset in older versions but Z strings in newer versions. Upgrading your database could produce the bug described in this issue even if your app code didn't change at all!

Glad we're proposing this change.

ptomato added a commit that referenced this issue Nov 18, 2021
Accepting this kind of string in Plain types poses a risk when
deserializing from databases, as some attach local-time-zone semantics to
this kind of string. This could cause user-facing off-by-one-day bugs.

Closes: #1751
ptomato added a commit that referenced this issue Nov 18, 2021
Accepting this kind of string in Plain types poses a risk when
deserializing from databases, as some attach local-time-zone semantics to
this kind of string. This could cause user-facing off-by-one-day bugs.

Closes: #1751
ptomato added a commit that referenced this issue Nov 24, 2021
Accepting this kind of string in Plain types poses a risk when
deserializing from databases, as some attach local-time-zone semantics to
this kind of string. This could cause user-facing off-by-one-day bugs.

Closes: #1751
ptomato added a commit that referenced this issue Nov 24, 2021
Accepting this kind of string in Plain types poses a risk when
deserializing from databases, as some attach local-time-zone semantics to
this kind of string. This could cause user-facing off-by-one-day bugs.

Closes: #1751
@dotnetCarpenter
Copy link

dotnetCarpenter commented Dec 21, 2021

As a side note.. This solution beats any regex validation!

const validateDate = x => {
    try {
        Temporal.PlainDate.from (x)
    } catch (e) {
        return false
    }
    return true
}

validateDate ('2022-02-29') // <- false

VS

const validateDate = x => /^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[1-2]\d|3[0-1])$/.test (x)

validateDate ('2022-02-29') // <- true

Motivation

I need to send a valid date as a String to our database and while the database will validate the date, I want to check the date before opening a database connection and wait for the response. This can obviously be done both client-side and server-side before communicating with the database.

Since the proposal-temporal is at stage 3, I guess there is little chance of adding new methods to Temporal. But a future optimization could be to add Temporal.PlainDate.test, Temporal.PlainDateTime.test etc. so that the static .test method can bypass the instantiation process and just return a Boolean. Perhaps bringing it on par with the performance of the regex solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants