Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sorting units by dimension order #1864

Closed
wants to merge 16 commits into from

Conversation

MichaelTiemannOSC
Copy link
Collaborator

@MichaelTiemannOSC MichaelTiemannOSC commented Oct 22, 2023

Once upon a time, ISO 80000 attempted to codify preferences for ordering units when describing quantities, for example kW h (not h Kw). While they have withdrawn the standard, there are many publications that state a preference for ordering units when describing quantities.

Pint allows users to choose to sort units alphabetically or not (thus kW * h becomes h * kW, whereas kW * s retmains kW * s becase kW sorts as less than s).

This PR adds a sort_dims parameter to pint.formatting.formatter which can be used in conjunction with alphabetical sorting. sort_dims imposes a "most significant dimension" order on the units, which are then sorted alphebetically (or not) within each dimension type.

This addresses #1841. It is intended as a prototype to evaluate as pint's formatting roadmap evolves. In particular, it needs a way to make sort_dims more accessible to the user.

Once upon a time, ISO 80000 attempted to codify preferences for ordering units when describing quantities, for example `kW h` (not `h Kw`).  While they have withdrawn the standard, there are many publications that state a preference for ordering units when describing quantities.

Pint allows users to choose to sort units alphabetically or not (thus `kW * h` becomes `h * kW`, whereas `kW * s` retmains `kW * s` becase `kW` sorts as less than `s`).

This PR adds a `sort_dims` parameter to `pint.formatting.formatter` which can be used in conjunction with alphabetical sorting.  `sort_dims` imposes a "most significant dimension" order on the units, which are then sorted alphebetically (or not) within each dimension type.

This addresses hgrecco#1841.  It is intended as a prototype to evaluate as pint's formatting roadmap evolves.  In particular, it needs a way to make `sort_dims` more accessible to the user.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Forgot to run pre-commit for initial commit.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 22, 2023

CodSpeed Performance Report

Merging #1864 will not alter performance

Comparing MichaelTiemannOSC:dim_order (957e9ca) with master (0f24b6f)

Summary

✅ 439 untouched benchmarks

@andrewgsavage
Copy link
Collaborator

Could dim_order be part of the registry so it can be modified by the user?
I think you can set the default sort_dim to true as this seems preferable to alphabetical sorting. Maybe add more examples as test cases to confirm this?
kVA is one I can think of offhand
Torque I see written as N m and inch lbf

Copy link
Contributor

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be a separate option, or whether it would be better to change sort to take a string instead of a boolean: sort="iso80000" (or something similar).

I think in addition to a registry option, we might also need a unit modifier.

pint/formatting.py Outdated Show resolved Hide resolved
pint/formatting.py Outdated Show resolved Hide resolved
pint/formatting.py Show resolved Hide resolved
@MichaelTiemannOSC
Copy link
Collaborator Author

Could dim_order be part of the registry so it can be modified by the user? I think you can set the default sort_dim to true as this seems preferable to alphabetical sorting. Maybe add more examples as test cases to confirm this? kVA is one I can think of offhand Torque I see written as N m and inch lbf

OK...I stuck dim_order into defaults. kVA and N•m both follow the defined dim_order (and fix the alphabetization problem), but inch•lbf is backward. I did a quick Google search and most references came up with lbf•inch (or lbf•ft), which agrees with dim order. Perhaps you have been reading too many Pint outputs 🤓. I'll expand the test case and push a new commit in a few.

Per suggestions from @andrewgsavage make `sort_dims` default.  Also allow users to specify dimensional sorting in registry definition.

Also fix a number of sub-tests not previously tested.  Tests all pass without any drama.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
pint/formatting.py Outdated Show resolved Hide resolved
Incorporate more code review feedback and fix some doc issues.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Remove `breakpoint()` left in by mistake.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
More attempts to make doc text and doc build happy.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
See whether I've turned off doctest where old behavior is no longer easily reproducible.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
And another attempt.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as ready for review October 22, 2023 23:08
@hgrecco
Copy link
Owner

hgrecco commented Oct 24, 2023

I really like this. A few things:

  1. Avoid using Tuple and List, rather tuple and list (which are available in Python 3.9 and make everything cleaner)
  2. I think we should make this option available but keep the default to match the current behavior, at least in the next release. If we decide another default is better, we should announce it now and enable it in the next next release.
  3. I am not sure that the dim_order belongs in DefaultsDefinition. I understand the reason (we need a way to change it), but that section is intended to configure the definition files, not the way the registry use/display them. For example, none of these are there:
    force_ndarray : bool

    I do have a solution (see below).
  4. How about turning string formatting into a composed class (a delegated behavior just as the parser). It could look something like this
class GenericPlainRegistry

    formatter: Formatter

    def __init__(self, ......):
         self.formatter = Formatter(<here goes the current formatting arguments>)

The formatting facet will be gone, and the Formatter class will do all the work. All Quantity and Unit class will delegate to this formatter classes

Then to change the behavior

ureg.formatter.locale = "de_DE"

What are the advantages?

  1. Formatting code will be contained within that class
  2. It is easier to hack/improve
  3. It is easier to replace: e.g. generating a new formatter class that perform something very specific without needing to support everything
  4. The attributeformatter serves as a namespace: configuration names become simpler, adding configuration does not pollute the registry.
  5. We might be able in the future to remove multiple of the UnitRegistry construction arguments that make everything harder to subclass.

(names are open for discussions)

@MichaelTiemannOSC
Copy link
Collaborator Author

Thanks for the comments! The first few could easily be done in the next commit.

Would you rather the format changes be part of this PR (generalizing it considerably--not a bad thing) or would you rather land the Formatter class changes first and then adapt these changes to that? I'm OK either way, I just don't want to head down the wrong path and need to backtrack.

@hgrecco
Copy link
Owner

hgrecco commented Oct 24, 2023

I think we can do both things in the same PR as long as we do not change the default behaviour.

@MichaelTiemannOSC
Copy link
Collaborator Author

Ok. It's going to take me a hot minute to wrap my head around the delegated behaviors and translate formatting to Formatter. I think I can do it ;-)

@hgrecco
Copy link
Owner

hgrecco commented Oct 25, 2023

@MichaelTiemannOSC Awesome. I am open to talk about it if you need/want.

I did this for the parser (although is not exposed to the user) some releases ago, and it was great. Briefly, the parser was part of the registry, deeply integrated. This made it very hard to test/fix/improve. Even replace (there were some proposals to replace the definition file format). What I did was to create the delegates submodule and the parser is there. Now the registry use that parser which is a standalone thing that emits definitions.

@MichaelTiemannOSC
Copy link
Collaborator Author

MichaelTiemannOSC commented Oct 25, 2023

Thanks for the offer. I would be happy to read any pseudocode you want to drop in the comments!

I found the secret decoder ring here: #1595

@MichaelTiemannOSC
Copy link
Collaborator Author

MichaelTiemannOSC commented Oct 26, 2023

Am I correct in thinking that the *_to_latex methods become format_* in a LatexFormatter class derived from Formatter?

If so, then similarly convert various formatters from stand-alone functions to formatting functions in a class hierarchy (PrettyFormatter, CompactFormatter, etc)?

@hgrecco
Copy link
Owner

hgrecco commented Oct 27, 2023

Indeed, but the default formatter should inherit from all of them (I think).

@MichaelTiemannOSC
Copy link
Collaborator Author

What do you suggest is the bridge to format_unit, which is needed in the UnitsContainer context? I'm looking at the fact that UnitsContainer uses class methods from ParserHelper (so not a full bridge to defparser) and the fact that to_units_container uses a registry parameter and some fancy steps in its traversal of the class hierarchy. Should I be thinking about this in terms of a class method in the Formatter class or some way to get a handle on the registry defined by the units in the units container from which I can ultimately access registry.formatter?

@hgrecco
Copy link
Owner

hgrecco commented Oct 27, 2023

The ParserHelper is a different beast. It is not a definition parser, but an sort of expression parser. My feeling is that what needs to be formatted are Quantity, Unit and Measurement. Only this make sense from the user perspective and are registry-aware (and therefore has access to the formatter).

I think that UnitsContainer should not be formatted. We could, during the transition, create a Formatter object hanging from that module that is called by UnitsContainer until we can deprecate that behaviour.

@MichaelTiemannOSC
Copy link
Collaborator Author

Thanks for that. It confirms that I asked the right question ;-)

@MichaelTiemannOSC
Copy link
Collaborator Author

By the way, UnitsContainer really doesn't want to be formatted, because it gets in a quite thick circular dependency with the Formatter definition in delegates.

@hgrecco
Copy link
Owner

hgrecco commented Oct 27, 2023

Can you put your code online, maybe in your github and I would be happy to take a look.

@MichaelTiemannOSC
Copy link
Collaborator Author

Thanks! I've got many things sorted, and of course things are a mess. Currently stuck on how to merge these concepts:

class GenericFormattingRegistry(
    Generic[QuantityT, UnitT], GenericPlainRegistry[QuantityT, UnitT]
):
    pass


class FormattingRegistry(
    GenericFormattingRegistry[objects.FormattingQuantity[Any], objects.FormattingUnit]
):
    Quantity: TypeAlias = objects.FormattingQuantity[Any]
    Unit: TypeAlias = objects.FormattingUnit

with the Formatter.

Current status is

=========================================================== 179 failed, 2482 passed, 25 skipped, 12 xfailed, 9 warnings, 213 subtests passed in 299.75s (0:04:59) ============================================================

With 64 of the 179 problems coming from the factoring of facets/measurement/objects.py and __format__ (which the above may inform).

Anyway, here comes the ugliness before things are working again...

@MichaelTiemannOSC
Copy link
Collaborator Author

Here is the WIP branch (branched from dim_order): https://github.com/MichaelTiemannOSC/pint/tree/dim_order_wip

@MichaelTiemannOSC
Copy link
Collaborator Author

MichaelTiemannOSC commented Oct 28, 2023

I think I see the path forward: facets.FormattingRegistry.Quantity and facets.FormattingRegistry.Unit and everything up through facets.GenericFormattingRegistry need to find their way into Formatter. The good news is that I see the connection via self._REGISTRY so it should be a matter of basic algebra to refactor. Initially I thought Formatter could derive from (or delegate to) the appropriate flavors of FormattingQuantity and FormattingUnit, but now I think the way forward is for PlainQuantity and PlainUnit to have some sort of formatting nature. I'm investigating...

@hgrecco if you see a major flaw with that assumption, let me know. In the meantime, I'm going to proceed with that assumption.

@MichaelTiemannOSC
Copy link
Collaborator Author

I've looked more deeply at how the SharedRegistryObject, how the Quantities and the Units come together, and the magical mechanics of calls to self._registry summons __format__ for both Quantity and Units, but I don't quite have the overall understanding in my mind about what's actually happening to pull the pieces apart and put them back together.

@hgrecco
Copy link
Owner

hgrecco commented Nov 1, 2023

I've looked more deeply at how the SharedRegistryObject, how the Quantities and the Units come together, and the magical mechanics of calls to self._registry summons __format__ for both Quantity and Units, but I don't quite have the overall understanding in my mind about what's actually happening to pull the pieces apart and put them back together.

The way I see it is the following:

class Formatter:

    def format_unit(self, unit, format_spec):
        # do something

    def format_quantity(self, quantity, format_spec):
        # do something

class Quantity:

     def __format__(self, format_spec):
        return self._REGISTRY.formatter.format_quantity(self, format_spec)

class Unit:

     def __format__(self, format_spec):
        return self._REGISTRY.formatter.format_unit(self, format_spec)

@MichaelTiemannOSC
Copy link
Collaborator Author

Thanks for that. I may not be able to get to it until mid-November due to upcoming travel.

@hgrecco
Copy link
Owner

hgrecco commented Nov 2, 2023

@MichaelTiemannOSC I can try to create a skeleton in a different branch and then you flesh it out.

@hgrecco
Copy link
Owner

hgrecco commented Nov 3, 2023

@hgrecco
Copy link
Owner

hgrecco commented Dec 31, 2023

I merged part of the formatter class into develop. As soon as I test it I will merge it into master

Picking up latest changes from hgrecco and crew.
Prepare to merge `develop` into this PR.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Move the proposed `dim_sort` functionality from being a system registry property to being a formatter property, specifically `BaseFormatter`.

Also fix typos noticed in `pint/testsuite/conftest.py`.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Based on this error message from the CI build:

```
Run sphinx-build -n -j auto -b html -d build/doctrees docs build/html
  sphinx-build -n -j auto -b html -d build/doctrees docs build/html
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.9.18/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.18/x64/lib
Running Sphinx v4.5.0

Sphinx version error:
The sphinxcontrib.applehelp extension used by this project needs at least Sphinx v5.0; it therefore cannot be built with this version.
Error: Process completed with exit code 2.
```

bump sphinx to >= 5 (>4 only got us to 4.5).

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Collaborator Author

@hgrecco I know enough to know I don't know how to fix the sphinx problems. I'm sure there's some history embedded in the version choices, but I don't know it. Otherwise, all green.

@hgrecco
Copy link
Owner

hgrecco commented Jan 17, 2024

This looks great. I will finish creating the formatter and then proceed to incorporate all these.

formatter changes.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@hgrecco hgrecco mentioned this pull request Jan 24, 2024
5 tasks
@MichaelTiemannOSC
Copy link
Collaborator Author

Closing because obsoleted by #1926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Override units display order (1 kW·h displayed as 1 h·kW)
4 participants