Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format specification mini language separator/round mode collision. #29

Closed
jagerber48 opened this issue Aug 17, 2023 · 1 comment
Closed

Comments

@jagerber48
Copy link
Owner

The format specification mini language regex allows for some confusing behavior as to if . is to be used as the upper separator, the decimal separator, or the round mode.

On version 0.26.0 the format specification string . or .e will have . as the upper separator even though the user may expect this format specification string to set round_mode=RoundMode.DEC_PLACE with ndigits=AutoExp (because !e does set round_mode=RoundMode.SIG_FIG with ndigits=AutoExp.

Here is the current regex pattern for the FSML.

pattern = re.compile(r'''^
                         (?:(?P<fill_mode>[ 0])=)?
                         (?P<sign_mode>[-+ ])?
                         (?P<alternate_mode>\#)?
                         (?P<top_dig_place>\d+)?
                         (?P<upper_separator>[n,.s_])?
                         (?P<decimal_separator>[.,])?
                         (?P<lower_separator>[ns_])?
                         (?:(?P<round_mode>[.!])(?P<ndigits>[+-]?\d+))?
                         (?P<exp_mode>[fF%eErRbB])?
                         (?:x(?P<exp_val>[+-]?\d+))?
                         (?P<prefix_mode>p)?
                         (?P<bracket_unc>\(\))?
                         $''', re.VERBOSE)

On possible resolution would be make the grouping separator part of the pattern lazy. This when . is encountered by itself its captured would be defered until the round_mode capture group.

pattern = re.compile(r'''^
                         (?:(?P<fill_mode>[ 0])=)?
                         (?P<sign_mode>[-+ ])?
                         (?P<alternate_mode>\#)?
                         (?P<top_dig_place>\d+)?
                         (?P<upper_separator>[n,.s_])??
                         (?P<decimal_separator>[.,])??
                         (?P<lower_separator>[ns_])??
                         (?:(?P<round_mode>[.!])(?P<ndigits>[+-]?\d+))?
                         (?P<exp_mode>[fF%eErRbB])?
                         (?:x(?P<exp_val>[+-]?\d+))?
                         (?P<prefix_mode>p)?
                         (?P<bracket_unc>\(\))?
                         $''', re.VERBOSE)

This causes . and .e format specifications to set round_mode=RoundMode.DEC_PLACE.

But this does break the tests like

self.assertEqual(f'{SciNum(1234.4321):,}, '1,234.4321')

Instead the result of this becomes 1234,4321. Because the capture group for the upper separator is greedy, the comma ends up being captured by the decimal separator capture group.

Ambiguity between the fill_mode capture group and the top_dig_place capture group was "resolved" by forcing the fill_mode capture group to end with a special symbol =. This was adapted from the built-in format specification language where the = indicates sign-symbol-aware filling. Some sort of symbol could be used to resolve the grouping separator/rounding mode collision as well. e.g.

pattern = re.compile(r'''^
                         (?:(?P<fill_mode>[ 0])=)?
                         (?P<sign_mode>[-+ ])?
                         (?P<alternate_mode>\#)?
                         (?P<top_dig_place>\d+)?
                         (?:\|(?P<upper_separator>[n,.s_])?
                         (?P<decimal_separator>[.,])?
                         (?P<lower_separator>[ns_])?\|)?
                         (?:(?P<round_mode>[.!])(?P<ndigits>[+-]?\d+))?
                         (?P<exp_mode>[fF%eErRbB])?
                         (?:x(?P<exp_val>[+-]?\d+))?
                         (?P<prefix_mode>p)?
                         (?P<bracket_unc>\(\))?
                         $''', re.VERBOSE)

Now any grouping separator specifications must be surrounded by '|...|' symbols. This makes '.e' control the round_mode and a specification like ',' is now an invalid format specification.

The major downside of this approach is that it makes the already complicated format specification language even more complicated and verbose. Another downside is that there is still likely confusion as to if ',' controls the upper separator or the decimal separator. Or, similarly, if _ controls the upper or lower separator. This confusion would persist even with the '|...|' change. Indeed, the answer depends on whether the capture groups are greedy or lazy. A subtle convention choice that is a frustrating mental burden for the user.

Yet another obvious solution would be to remove grouping separator control from the format specification language entirely. One downside here, other than reduced functionality, is that this feels like a regression compared to the python built-in format specification mini language which does support a grouping option symbol. However, the built-in specification only allows ',' or '_' to be used as an upper separator, so the language avoids confusion with the '.' precision symbol. But this removal would greatly simplify format specification mini language, and one could easily argue that the grouping separators are best left to be controlled as global formatting options. (especially if they are being modified to fit conventions for a particular locale).

@jagerber48
Copy link
Owner Author

Fixed by #95 which removes separator configuration from the FSML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant