Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format() and parse() in SimpleUnitFormat behave differently upon label() #124

Closed
DrDaleks opened this issue Jul 27, 2018 · 16 comments
Closed
Labels
bug prio:1 Priority 1
Milestone

Comments

@DrDaleks
Copy link
Contributor

Consider the following code

    Unit<Force> lbf = Units.NEWTON.multiply(4.4482216152605); // pound-force
    Unit<Area> sqin = USCustomary.INCH.pow(2).asType(Area.class); // square inch
    Unit<Pressure> psi = lbf.divide(sqin).asType(Pressure.class); // pound-force per square inch
    System.out.println("Pounds per square inch: " + psi);
    SimpleUnitFormat.getInstance().label(psi, "psi");
    System.out.println("Pounds per square inch: " + psi);
    System.out.println("Square psi: " + psi.pow(2));

    System.out.println();

    Unit<Pressure> another_psi = Units.NEWTON.multiply(6895).divide(Units.SQUARE_METRE).asType(Pressure.class);
    System.out.println("Pounds per square inch: " + another_psi);
    SimpleUnitFormat.getInstance().label(another_psi, "psi");
    System.out.println("Pounds per square inch: " + another_psi);
    System.out.println("Square psi: " + another_psi.pow(2));

which outputs

Pounds per square inch: N*4.4482216152605/in²
Pounds per square inch: psi
Square psi: N*4.4482216152605²/in^4

Pounds per square inch: N*6895.0/m²
Pounds per square inch: psi
Square psi: N*6895.0²/m^4

Regardless of the usefulness of a square psi (just a random example), wouldn't it make sense to format it as psi^2 instead of the fully developed form once the label as been defined in the formatter? It seems the simple formatter(s) dive(s) straight into the ProductUnit's components without considering the (basic) case of a powered "labeled" unit (while powered "base" units render fine as you can see). Bug or feature?

@keilw keilw added the analysis label Jul 30, 2018
@keilw
Copy link
Member

keilw commented Jul 30, 2018

SimpleUnitFormat is much more simple (as the name suggests) than mot of the other implementations.

CommonFormatDemo shows your example plus a comparison with EBNFUnitFormat:

Pounds per square inch: N*4.4482216152605/in²
Pounds per square inch: psi
Square psi: N*4.4482216152605²/in^4
Square psi (EBNF): kg·m/s²²/m/12⁴

Pounds per square inch: N*6895.0/m²
Pounds per square inch: psi
Square psi: N*6895.0²/m^4
Square psi (EBNF): kg·m/s²²/m⁴

The advantage over SimpleUnitFormat is that EBNFUnitFormat uses the Extended Backus-Naur Form (EBNF) as unit grammar. As it uses properties and PropertyResourceBundle as a source, this does not work at runtime to add new units, therefore label() is currently not used there. It comes closer to your idea of psi², but psi is unknown so the strange kg·m/s²²/m⁴ which should be (kg·m/s²)²/m⁴ to make the seemingly power(22) understandable for what it really is.

Adding UCUMFormat, examples from UCUMFormatDemo

Pounds per square inch: lb·(m/s²)*9.80665/c(cm*254.0)²
Pounds per square inch: psi
Square psi: lb²·(m/s²)*9.80665²/c(cm*254.0)^4
Square psi (EBNF): g_n²·lb_av²/in_i⁴
Square psi (UCUM): g_n2.lb_av2/in_i4

Pounds per square inch: N*6895.0/m²
Pounds per square inch: psi
Square psi: N*6895.0²/m^4
Square psi (EBNF): kg·m/s²²/m⁴
Square psi (UCUM): (N.6895)2/m4

You notice, not only the UCUMFormat (by definition it just uses ASCII characters) understands well-defined UCUM units like POUND_FORCE or SQUARE_INCH_INTERNATIONAL, it also offers the (Unicode confirmant) EBNFUnitFormat necessary hints to improve results if they are based on units that are known. And certain parts of your PSI definition also get improvement by the label() definitions the UCUM system also adds. Mixing them can be tricky, although the best bet would be to activate label() for EBNFUnitFormat which the underlying SymbolMap would already foresee. There should be no side-effects between either of the UnitFormat implementations, therefore you may have to decide which one to use if you introduce a new unit in your application.

@keilw
Copy link
Member

keilw commented Jul 30, 2018

Btw, UCUM should get PSI, see unitsofmeasurement/uom-systems#59, the UCUM specification of Other Legacy Units (§47) knows it, we just have not been able to add it yet. @adufour Should you have a use case for it and not just used it as an example, you'd be welcome to contribute more units to the UCUM system.

@DrDaleks
Copy link
Contributor Author

Thx for the insight. Indeed UCUM/EBNF do improve the refactoring+formatting, but I remain quite a bit confused by the lack of easy-ish way to get "psi^2" or "psi²" displayed (other than by force-labeling it unit as such), while:

  1. "psi" is known by all formats (read "explicitly labeled" in the case of SimpleUnitFormat).
  2. "psi^2" and "psi²" are both nicely parsed by SimpleUnitFormat and EBNFUnitFormat (once "psi" is labeled in the former case)

Am I missing a fundamental reasoning here? Should I be writing my own format for this (arguably straightforward) use case?
One solution I had envisaged here was to modify ProductUnit to cope for the specific case of a self-powered unit (or better off properly extending it, alas the class is marked final).

The reason I focused on this "toy" example is because I work in a field where custom units are frequent (some of them are purely for internal use and would not benefit UCUM in any way). I thus need to evaluate how much freedom (or lack thereof) the current implementations offer to downstream code in order to simplify unit parsing/manipulation by end-users.

@keilw
Copy link
Member

keilw commented Jul 31, 2018

Thx for the update. If you say, Parsing works after labelling psi, but formatting doesn't, even in the same type of UnitFormat, then I think it's worth to look after that. It would be a mismatch between formatting and parsing. See above demos, we might add something there or even JUnit tests. Due to the non-SI nature it cannot be tested directly in the RI, but fixing or addressing it, would have to happen here.

@keilw
Copy link
Member

keilw commented Jul 31, 2018

I added a few test lines to CommonFormatDemo. They show, both the Unicode and ASCII representation of psi² can be parsed. However, as expected in this demo u1 = ebnf.parse("psi^2") using EBNFUnitFormat fails. I would have been surprised by a side-effect between the two formatters. It could be possible, UCUMFormat behaves differently and even EBNFUnitFormat might do, but unless PSI is known by a Unit system it seems quite unlikely.
So we'll keep this one open to try archive a consistent behavior between format and parse.

@keilw keilw added the bug label Jul 31, 2018
@keilw keilw changed the title Formatting of powered units? format and parse in SimpleUnitFormat behave differently upon label() Jul 31, 2018
@keilw keilw removed the analysis label Jul 31, 2018
@keilw keilw changed the title format and parse in SimpleUnitFormat behave differently upon label() format() and parse() in SimpleUnitFormat behave differently upon label() Jul 31, 2018
@DrDaleks
Copy link
Contributor Author

Hmm, that's an odd one. On my side the parsing you mentioned works, e.g. running just these few lines

    UnitFormat ebnf = EBNFUnitFormat.getInstance();
    System.out.println(ebnf.format(ebnf.parse("psi^2")));
    System.out.println(ebnf.format(ebnf.parse("psi²")));

works (and in both cases produces lb_av²·g_n²/in_i⁴)
(NB: I depend on systems-ucum 0.9 and systems-quantity 0.9)

@keilw
Copy link
Member

keilw commented Jul 31, 2018

I did not try it in the UCUM demo. Since all relevant UCUM units are added to EBNFUnitFormat via its properties, it may work there but it did not work in the simple Common Units environment.

@keilw
Copy link
Member

keilw commented Jul 31, 2018

It does in UCUMFormatDemo. I can't say without digging into the code of each why EBNFUnitFormat does seem to react to the label() method in SimpleUnitFormat, but more importantly, both it and UCUMFormat (with certain restrictions, the PRINT variant does not parse by design) are synchronous in parsing its own output. E.g. g_n²·lb_av²/in_i⁴ is properly parsed by EBNFUnitFormat, too.

Most importantly (I commented out the line for later testing)
u1 = SimpleUnitFormat.getInstance().parse("lb²·(m/s²)*9.80665²/c(cm*254.0)^4") fails with SimpleUnitFormat, so it cannot parse its own output unlike the others.

@DrDaleks
Copy link
Contributor Author

My apologies for the confusion, but I have the (surely wrong) impression this is getting slightly off-track from the original issue I reported. Here is another try at explaining what (I think) is wrong:

Suppose I use SimpleUnitFormat (for now) to label a given unit. Then:

  • If the unit in question is not a product unit (i.e. a base, alternate or transformed unit), then its squared version gets correctly formatted as label²
  • If the unit in question is a product unit, then its squared version is not formatted as label², but instead the power is applied to the inner elements

To me that's just a consistency issue, regardless of the formatter being used (but more on that in a bit).

This new toy example exemplifies this behaviour a bit better than my original code:

  public static void main(String... args) {
    UnitFormat formatter = SimpleUnitFormat.getInstance();
    System.out.println("== Use case 1: playing with base units ==");
    AlternateUnit<?> aCd = new AlternateUnit<>(Units.CANDELA, "altCD");

    System.out.println("Candela: " + formatter.format(Units.CANDELA));
    System.out.println("Candela times 2: " + formatter.format(Units.CANDELA.multiply(2)));
    System.out.println("Square Candela: " + formatter.format(Units.CANDELA.pow(2)));
    System.out.println("Alt. Candela: " + formatter.format(aCd));
    System.out.println("Square alt. Candela: " + formatter.format(aCd.pow(2)));

    System.out.println("=> The Candela shall now be known as \"CD\"");
    formatter.label(Units.CANDELA, "CD");

    System.out.println("Candela: " + formatter.format(Units.CANDELA));
    System.out.println("Candela times 2: " + formatter.format(Units.CANDELA.multiply(2)));
    System.out.println("Square Candela: " + formatter.format(Units.CANDELA.pow(2)));
    System.out.println("Alt. Candela: " + formatter.format(aCd));
    System.out.println("Square alt. Candela: " + formatter.format(aCd.pow(2)));

    System.out.println();
    System.out.println("== Use case 2: playing with product units ==");
    ProductUnit<?> cdK = new ProductUnit<>(Units.CANDELA.multiply(Units.KELVIN));
    AlternateUnit<?> aCdK = new AlternateUnit<>(cdK, "altCDK");

    System.out.println("Candela-Kelvin: " + formatter.format(cdK));
    System.out.println("Candela-Kelvin times 2: " + formatter.format(cdK.multiply(2)));
    System.out.println("Square Candela-Kelvin: " + formatter.format(cdK.pow(2)));
    System.out.println("Alt. Candela-Kelvin: " + formatter.format(aCdK));
    System.out.println("Square alt. Candela-Kelvin: " + formatter.format(aCdK.pow(2)));

    System.out.println("=> The Candela-Kelvin shall now be known as \"CDK\"");
    formatter.label(cdK, "CDK");

    System.out.println("Candela-Kelvin: " + formatter.format(cdK));
    System.out.println("Candela-Kelvin times 2: " + formatter.format(cdK.multiply(2)));
    System.out.println("Square Candela-Kelvin: " + formatter.format(cdK.pow(2)));
    System.out.println("Alt. Candela-Kelvin: " + formatter.format(aCdK));
    System.out.println("Square alt. Candela-Kelvin: " + formatter.format(aCdK.pow(2)));
  }

This produces the following:

== Use case 1: playing with base units ==
Candela: cd
Candela times 2: cd*2.0
Square Candela: cd²
Alt. Candela: altCD
Square alt. Candela: altCD²
=> The Candela shall now be known as "CD"
Candela: CD
Candela times 2: CD*2.0
Square Candela: CD²
Alt. Candela: altCD
Square alt. Candela: altCD²

== Use case 2: playing with product units ==
Candela-Kelvin: CD·K
Candela-Kelvin times 2: (CD·K)*2.0
Square Candela-Kelvin: CD²·K²
Alt. Candela-Kelvin: altCDK
Square alt. Candela-Kelvin: altCDK²
=> The Candela-Kelvin shall now be known as "CDK"
Candela-Kelvin: CDK
Candela-Kelvin times 2: CDK*2.0
Square Candela-Kelvin: CD²·K²      // Problem: this should read "CDK²"!
Alt. Candela-Kelvin: altCDK
Square alt. Candela-Kelvin: altCDK²

As you may notice, the only issue lies with the product unit formatting, which is inconsistent.

Now interestingly, if I reproduce the same experiment with EBNFUnitFormat instead of SimpleUnitFormat, it gets weirder:

== Use case 1: playing with base units ==
Candela: ca
Candela times 2: ca·2
Square Candela: ca²
Alt. Candela: altCD
Square alt. Candela: altCD²
=> The Candela shall now be known as "CD"
Candela: ca
Candela times 2: ca·2
Square Candela: ca²
Alt. Candela: altCD
Square alt. Candela: altCD²

== Use case 2: playing with product units ==
Candela-Kelvin: ca·K
Candela-Kelvin times 2: ca·K
Square Candela-Kelvin: ca²·K²
Alt. Candela-Kelvin: ca·K           // Problem: shouldn't this read "altCDK"?
Square alt. Candela-Kelvin: ca·K²           // Problem: shouldn't this read "altCDK²"?
=> The Candela-Kelvin shall now be known as "CDK"
Candela-Kelvin: ca·K
Candela-Kelvin times 2: ca·K
Square Candela-Kelvin: ca²·K²
Alt. Candela-Kelvin: ca·K           // Problem: shouldn't this read "altCDK"?
Square alt. Candela-Kelvin: ca·K²           // Problem: shouldn't this read "altCDK²"?

Here the labels are actually never taken into account (neither for product nor alternate units).

@keilw
Copy link
Member

keilw commented Jul 31, 2018

Thanks for the clarification. It should help to narrow it down to e.g. to ProductUnit, but the issue is, that at least for certain units, SimpleUnitFormat.format() produces different results than parse() accepts and what seems worse it won't parse its own format output.

@keilw keilw added this to the 2.0 milestone Jul 31, 2018
@keilw keilw added prio:1 Priority 1 to do labels Jul 31, 2018
@DrDaleks
Copy link
Contributor Author

DrDaleks commented Jul 31, 2018

I think I found at least 2 ways to fix the formatting issue. Basically, what happens is AbstractUnit.pow(int) internally powers each side of a product instead of the unit as a whole (thus when you reach the formatting call, it's already too late and the formatter can't be blamed for not finding the proper label...).
Simply(?) powering the original unit would suffice, which could be done via

new ProductUnit(new Element[]{ new Element(someProductUnit, exponent, 1) });

Since that's all private/inaccessible, I found two independent ways to call this line from client code:

  • I removed the final tag from AbstractUnit.pow(int) to override it in ProductUnit
  • I replaced the implementation of static ProductUnit.ofPow(int), and called that instead
  • PS: shouldn't these two methods converge to do the same thing?

Both options solve the formatting issue quite nicely (but I have no clue on the potential side-effects).
Thoughts? should I PR any of these (or both)? (If so, I'd obviously apply the same fix for roots)

@keilw
Copy link
Member

keilw commented Jul 31, 2018

Yes please propose a PR. While larger portions of code (several hundred or thousand LOC) advise those who change that much to join the JCP at least as Associate Member (contributor) smaller fixes are OK. Of course you're more than welcome to join the JCP if you want. Thanks.

@keilw
Copy link
Member

keilw commented Aug 6, 2018

@adufour Hi, just wanted to ask, if you may have time for a PR yourself, or if someone else might better work on this based on the evidence gathered here?

@DrDaleks
Copy link
Contributor Author

DrDaleks commented Aug 6, 2018

I was working on the PR (for the mislabelling part, that is), but then realised that the fix I found actually broke the formatting "before" labelling. In short:

  • The current behaviour formats cdK.pow(2) as CD²·K² (correct) prior to labelling CD·K as CDK, and CD²·K² (incorrect) afterwards
  • With my fix(es), cdK.pow(2) now formats as null² (incorrect) prior to labelling, but CDK² (correct) afterwards

Investigating this as we speak

@keilw
Copy link
Member

keilw commented Aug 7, 2018

Great, thanks a lot for the update. While this has P1 at the moment, and ideally SimpleUnitFormat should work before and after labelling it mostly seems to affect new or unknown units, so we would love to have it in the EDR already, which is due before the end of the month (9M before a Renewal Ballot starts, and we should give the JCP PMO at least a few weeks to process the EDR) So no pressure, if necessary, we could schedule it for the next stage (Public Review, that is also approved by the JCP EC, so a more critical milestone)

DrDaleks pushed a commit to DrDaleks/indriya that referenced this issue Aug 8, 2018
keilw added a commit that referenced this issue Aug 9, 2018
#124: fix handling of powered units (creation and simple formatting)
@keilw keilw added in progress and removed to do labels Aug 9, 2018
@keilw
Copy link
Member

keilw commented Aug 10, 2018

Fixed by #129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug prio:1 Priority 1
Projects
None yet
Development

No branches or pull requests

2 participants