Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Can we omit TYPE from the unit value? #17

Closed
rxaviers opened this issue Oct 2, 2018 · 15 comments
Closed

Can we omit TYPE from the unit value? #17

rxaviers opened this issue Oct 2, 2018 · 15 comments

Comments

@rxaviers
Copy link
Member

rxaviers commented Oct 2, 2018

The unit in your proposal is composed of (what UTS#35 defines as) a TYPE and a UNIT.

In the example below, we have TYPE acceleration and UNIT meter-per-second-squared (a compound unit in this case).

(9.81).toLocaleString("en-US", {
    style: "unit",
    unit: "acceleration-meter-per-second-squared",
    unitDisplay: "short"
});
// ==> "9.81 m/s²"

Can we omit TYPE from the unit value?

(9.81).toLocaleString("en-US", {
    style: "unit",
    unit: "meter-per-second-squared",
    unitDisplay: "short"
});
// ==> "9.81 m/s²"
  • Does TYPE provide any value? I guess no:
    • Can I use TYPE alone and let UNIT be automatically inferred? I guess no, i.e., it would be similarly bad as having a default currency
    • Is it ever used to distinguished UNITs? I guess not
  • What TYPE to use for compond units that are not explicitly defined by internal (probably CLDR) data? UTS#35 basically says: use a direct match or generate it by yourself (using perUnitPattern). When the implementation performs the latter (generate it) TYPE is completely ignored anyway.
  • Is meter-per-second a speed or a velocity? If TYPE is required, it's most likely you'll need to check documentation to figure out what the TYPE is. That doesn't happen with UNIT. Or more problematic, since TYPE is ignored as seen above generating compond units, it could generate different results for speed-meter-per-second (direct match) vs velocity-meter-per-second (generated by mistake)
@sffc
Copy link
Collaborator

sffc commented Oct 3, 2018

The latest UTS35 spec adds a definition for "unit identifier" to refer to the whole string TYPE-UNIT.

See: https://unicode.org/repos/cldr/trunk/specs/ldml/tr35-general.html#Unit_Elements

Ticket: https://unicode.org/cldr/trac/ticket/11271

A problem with not having TYPE in the setting string is that UNIT by itself is not guaranteed to be unique. For example, there could be two units with the same UNIT but not the same TYPE. The unit "ounce" would be a good example (although it looks like in that case CLDR made the volume version named "fluid-ounce").

Now, there is another question, which is about whether we want to be smarter when resolving which unit to use for a particular locale. This is a nontrivial problem. It turns out that you can't just say "give me the length unit for locale X", because you also need to know the context: length units used to measure someone's height are different than the length units to measure road distances, for example.

CLDR has some limited data for this: https://unicode.org/cldr/trac/browser/tags/release-33-1-d04/common/supplemental/supplementalData.xml#L4770

I have issues filed to improve this data structure. For example: https://unicode.org/cldr/trac/ticket/11452

Ultimately, what I would like to eventually see is one function that maps from context, locale, and magnitude to unit identifier, and then let the currently proposed API map from unit identifier, locale, and number to string. That would be material for a new proposal.

@rxaviers
Copy link
Member Author

rxaviers commented Oct 4, 2018

A problem with not having TYPE in the setting string is that UNIT by itself is not guaranteed to be unique. For example, there could be two units with the same UNIT but not the same TYPE. The unit "ounce" would be a good example (although it looks like in that case CLDR made the volume version named "fluid-ounce").

Currently, even for "ounce" the UNIT alone is unique, right? We have (a) "volume-fluid-ounce" (type volume, unit fluid-ounce); (b) "mass-ounce-troy" (type mass, unit ounce-troy); (c) "mass-ounce" (type mass, unit ounce)

If UNIT alone isn't unique we have a problem in the compound unit algorithm, e.g., let's suppose we have a duplicate unit "x" and we want to display a unit TYPE-y-per-x, which x to use in perUnitPattern? That being said, I acknowledge this is my own conclusion and it isn't explicit in UTS#35 docs and it would be good to get a ticket filed there.

... whether we want to be smarter ...

If we ever go that path, would we want to do unit conversions too? (I guess not)

@sffc
Copy link
Collaborator

sffc commented Oct 5, 2018

A problem with not having TYPE in the setting string is that UNIT by itself is not guaranteed to be unique. For example, there could be two units with the same UNIT but not the same TYPE. The unit "ounce" would be a good example (although it looks like in that case CLDR made the volume version named "fluid-ounce").

Currently, even for "ounce" the UNIT alone is unique, right? We have (a) "volume-fluid-ounce" (type volume, unit fluid-ounce); (b) "mass-ounce-troy" (type mass, unit ounce-troy); (c) "mass-ounce" (type mass, unit ounce)

Yeah. Let me follow up with the CLDR folks and clarify exactly what is guaranteed to be unique and what isn't guaranteed to be unique.

If UNIT alone isn't unique we have a problem in the compound unit algorithm, e.g., let's suppose we have a duplicate unit "x" and we want to display a unit TYPE-y-per-x, which x to use in perUnitPattern? That being said, I acknowledge this is my own conclusion and it isn't explicit in UTS#35 docs and it would be good to get a ticket filed there.

speed-meter-per-second is explicitly listed in CLDR. It is not a "custom" compound unit.

This reminded me to file #19 to discuss adding custom compound units to the proposal.

... whether we want to be smarter ...

If we ever go that path, would we want to do unit conversions too? (I guess not)

We might. That's a future discussion, though. I filed tc39/ecma402#277 to follow up with this.

@rxaviers
Copy link
Member Author

rxaviers commented Oct 5, 2018

Another argument against TYPE is tc39/ecma402#32 (comment):

For example, say you want to format the bytes throughput (e.g., MB/sec). How would you name the category? throughput (e.g., throughput-megabyte-per-second)? CLDR provides digital-megabyte and duration-second, but no precomputed form for megabyte per second. So, there's no defined category for such custom (yet popular) combination.

1: On CLDR docs (6.1 per Unit patterns), it says Some units already have 'precomputed' forms, but for all other ones, there are rules for deducing them by using compoundUnit and perUnitPattern. The implementation has to figure out that megabyte-per-second should be made from digital-megabyte and duration-second (i.e., it must deduce the unit categories).

@rxaviers
Copy link
Member Author

rxaviers commented Oct 5, 2018

Yeah. Let me follow up with the CLDR folks and clarify exactly what is guaranteed to be unique and what isn't guaranteed to be unique.

Awesome thanks @sffc

This reminded me to file #19 to discuss adding custom compound units to the proposal.

👍 let's include it.

@sffc
Copy link
Collaborator

sffc commented Oct 5, 2018

With the current proposal, unit identifiers will be rejected unless they are in CLDR data at the time of the proposal (#11).

@sffc
Copy link
Collaborator

sffc commented Oct 8, 2018

Yeah. Let me follow up with the CLDR folks and clarify exactly what is guaranteed to be unique and what isn't guaranteed to be unique.

Awesome thanks @sffc

Don't know if you saw the reply from Mark Davis, but he said to use the unit identifier.

@rxaviers
Copy link
Member Author

rxaviers commented Oct 8, 2018

I posted my first comment in that email thread. Let's wait for the conclusion... By the way, thanks for creating that thread.

@rxaviers
Copy link
Member Author

Mark Davis replied saying it makes sense that unit part can be used as identifier. Asked us to file a ticket to be reviewed by CLDR meeting this week https://unicode.org/cldr/trac/ticket/11472

@sffc
Copy link
Collaborator

sffc commented Oct 24, 2018

As of now, I'm still assuming the full unit identifier syntax in the spec. If CLDR changes their mind, we can still update that in the spec. It would be nice if that decision can be made before Stage 3.

@littledan
Copy link
Member

Agree that we should block Stage 3 on a solid answer here. cc @macchiati @aphillips

@sffc
Copy link
Collaborator

sffc commented Oct 30, 2018

I pinged Mark on the ticket and it looks like it will be moving forward as Rafael proposed. :)

I'll keep this thread open to wait until CLDR confirms the change, and then update the ES proposal.

@sffc
Copy link
Collaborator

sffc commented Nov 3, 2018

I submitted the change to LDML. If approved, it should go out in CLDR 35:

https://unicode.org/cldr/trac/changeset/14583/

I gave "the unit part of the unit identifier" a slightly shorter name: an "unqualified unit identifier". We can use that language in our spec.

I set Mark as the reviewer. He should get to it in a couple of days. Once Mark signs off, I'll update the Ecma 402 proposal to reflect the changes in CLDR.

@sffc
Copy link
Collaborator

sffc commented Nov 23, 2018

I updated the spec with the new "core unit identifiers" syntax last week in 7606464.

@sffc sffc closed this as completed Nov 23, 2018
@rxaviers
Copy link
Member Author

👍 thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants