Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Potential problem with data loading #17

Closed
sffc opened this issue Oct 20, 2019 · 13 comments
Closed

Potential problem with data loading #17

sffc opened this issue Oct 20, 2019 · 13 comments

Comments

@sffc
Copy link

sffc commented Oct 20, 2019

There is a potential issue with the design of the API as it relates to polyfills and tc39/ecma402#210 that may need to perform async data loading.

I can see a future where one might write code such as,

const fmt = await Intl.DateTimeFormat();
fmt.formatRange(start, end);

The await keyword could allow the polyfill to load data from a service somewhere if necessary, and resolve to a new Intl object ready to go with synchronous formatting.

The problem is, with Intl.DateTimeFormat, adding a new formatRange method means that the intent of the programmer is not known until after the object is constructed. The constructor would have to load the data for both "normal" date formatting and "range" date formatting, even if range formatting would never be performed.

Even if the data in the case of DateTimeFormat formatRange is small, this proposal sets precedent for how to add other range formatting APIs in the future (like number range), which could conceivably have a bigger footprint.

Perhaps this is not an actual tangible problem, but I thought I would bring it up.

@sffc
Copy link
Author

sffc commented Oct 20, 2019

@rxaviers, thoughts?

@sffc
Copy link
Author

sffc commented Oct 24, 2019

We discussed this in the TG2 meeting today. Notes from that discussion:


SRL: You could have cases where a certain operation expects a heavyweight operation on the backend. You might not be able to operate until you know the content of what you are formatting. I don't think this is something that can be solved by changing the number of final functions.

FYT: The data for format and formatRange have minimal common data. So there's a lot of waste. If we know in the constructor, it can make it more efficient. We can save some space, like ⅔.

LBR: While I like and appreciate the concept of using a promisified API, I have concerns. We cannot break things we have already shipped.

ZB: Could we say, some data is pre-loaded by default, and then in an option bag, you can specify more data to load. But if you need data in a terminal method, then that gets loaded at that point.

SFC: It would be good to load all data in the constructor, because we want the terminal format method to not be async.

RCA: I was thinking something like :

const fmt = new Intl.DateTimeFormat(  , {
  load : ['format', 'formatRange']
});
fmt.formatRange(start, end);

LBR: await Intl.load(); then classical API follows up

ZB: (proposal):

let dtf = new Intl.DateTimeFormat(undefined, { data: [“range”] });
dtf.formatRange(); // works

let dtf = new Intl.DateTimeFormat(undefined);
dtf.formatRange(); // errors out with “DateTimeFormat has to be loaded with data “range”

RCA: As an end user, I'm used to passing arguments like this.

SFC: Maybe something more like,

let dtf = new Intl.DateTimeFormat(undefined, { range: true });
dtf.format(x);  // singular
dtf.format(x, y);  // range

ZB: Having the same terminal method name with different arguments is confusing. The user knows they will need range, and they can tell us that in the constructor.

SFC: A new constructor would also pass intent to the constuctor.

let drf = new Intl.DateRangeFormat(...)

ZB: I'd like to think about what this would look like if we added new long-tail data sets.

@sffc
Copy link
Author

sffc commented Oct 24, 2019

I also synced with Felipe (proposal champion) offline. We agreed that this feels like an implementation detail that we don't want to have to bubble up to the user level.

Ultimately what we need is for the user to tell us, in the constructor, that they will be performing date range formatting. If we added an option like rangeStyle, an enum, we could require the style to be set in the constructor options:

new Intl.DateTimeFormat(undefined).formatRange(x, y);
// TypeError: you must pass a rangeStyle option

new Intl.DateTimeFormat(undefined, {
  rangeStyle: "auto"
}).formatRange(x, y);
// OK

At first, rangeStyle could be an enum with only one possible value. That feels a little bit messy, but it would solve our objective of putting a formatRange intention into the constructor.

Felipe said he'll think about other possible options that would achieve this objective.

@ljharb
Copy link
Member

ljharb commented Oct 24, 2019

Why not make it a static method that takes an instance, instead of an instance method that needs to be "primed"?

@sffc
Copy link
Author

sffc commented Oct 24, 2019

Why not make it a static method that takes an instance, instead of an instance method that needs to be "primed"?

You mean like this?

const fmt = new Intl.DateTimeFormat();
Intl.DateTimeFormat.formatRange(fmt, x, y);

Potential problems:

  1. Would require "create and destroy" of the formatRange data on each call.
  2. We still don't know the intent to format ranges until the terminal method, making it harder to know when to load data at runtime.
  3. Arguably not intuitive of an API.

@ljharb
Copy link
Member

ljharb commented Oct 24, 2019

Why would the first one be necessary? The data could be cached as part of the static method, or stored via internal slot on the instance.

You'd load the data at the time formatRange is called.

@ljharb
Copy link
Member

ljharb commented Oct 24, 2019

As for intuitiveness, the most intuitive thing is to eagerly load the data so the user never has to think about such things, but the OP considerations seem to conflict with that.

@sffc
Copy link
Author

sffc commented Oct 24, 2019

You'd load the data at the time formatRange is called.

The problem with that is that data loading could be an async operation, and the format methods should not ever be async, in my opinion. It would be OK to make a constructor or factory method async, but that requires knowing the intention of the user in the constructor or factory method.

@ljharb
Copy link
Member

ljharb commented Oct 24, 2019

That's only a problem for polyfills tho, which would (as always) need to load everything up front. Is that the only concern?

@sffc
Copy link
Author

sffc commented Oct 24, 2019

Knowing the intention for range formatting in the constructor would help:

  1. Polyfills
  2. Possible future data loading API such as in Data-Driven API ecma402#210
  3. FYT said this change could make the V8 implementation more efficient.

@ljharb
Copy link
Member

ljharb commented Oct 25, 2019

it kind of seems like it’d be clearer then to have an explicit async instance method to load the data, and have formatRange throw until the data was available.

@sffc
Copy link
Author

sffc commented Nov 27, 2019

I wanted to amend this issue with some data about how big or small the range formatting data and code actually are.

Data Size

https://github.com/unicode-org/cldr/blob/release-35-1/common/main/en.xml#L1719

Although there are a lot of interval formats, since we know the skeleton in the constructor, we only actually need to carry the corresponding <intervalFormatItem id="yMMMEd">, or the fallback pattern.

Code Size

Having read the code for ICU DateIntervalFormat, it is rather small when compared with the code size of the machinery required for full ICU SimpleDateFormat. As a back-of-the-envelope estimation, the uncompressed and debug-build code size for certain files is:

  • i18n/dtitvfmt.o: 48 KiB
  • i18n/smpdtfmt.o: 94 KiB
  • i18n/dtptngen.o: 109 KiB
  • i18n/decimfmt.o: 67 KiB
  • i18n/number_mapper.o: 24 KiB
  • i18n/number_formatimpl.o: 28 KiB

formatRange requires only the very first one of those files, or 13% of the total. The rest are just a subset of the ones required to power regular datetime format, so the final percent is probably even lower.

Similarly, i18n/numrange_impl.o is quite small, only 23 KiB.

Conclusion

Seeing that the code and data size are quite small, I would be satisfied if this issue were closed as WAI. I will leave the final call up to @fabalbon.

@sffc
Copy link
Author

sffc commented Jan 8, 2020

Closing this issue as per ECMA-402 December meeting.

@sffc sffc closed this as completed Jan 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants