-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Move ExtendedBigDecimal to uucore/format, make use of it in formatting functions #7458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| prefix: Vec<u8>, | ||
| suffix: Vec<u8>, | ||
| formatter: F, | ||
| _marker: std::marker::PhantomData<T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PhantomData thing doesn't make me happy, but I couldn't find another way to have a Formatter that either takes a value parameter (i64/u64) or a reference (&ExtendedBigDecimal).
Ideas welcome....
|
GNU testsuite comparison: |
e38ba06 to
c909d80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors formatting functions to use ExtendedBigDecimal instead of f64 for improved arbitrary precision and to better support various floating‐point formats (including hexadecimal and scientific). Key changes include:
- Moving ExtendedBigDecimal to uucore/format and adding a MinusNan variant and corresponding From implementation.
- Updating formatting functions and the Format generic interface to work with ExtendedBigDecimal.
- Adjustments to related components in seq, dd, and csplit to accommodate the new ExtendedBigDecimal usage.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/uucore/src/lib/features/format/extendedbigdecimal.rs | Added MinusNan variant, From implementation and adjusted match arms for operations and formatting. |
| src/uucore/Cargo.toml | Updated dependency list to include bigdecimal and num-traits in the format feature. |
| src/uucore/src/lib/features/format/mod.rs | Revised Format to be generic over the value type with the usage of PhantomData. |
| src/uu/seq/src/seq.rs | Updated ExtendedBigDecimal usage in formatting and value conversion. |
| src/uu/dd/src/progress.rs, src/uu/csplit/src/split_name.rs, src/uu/seq/src/numberparse.rs, src/uu/seq/src/number.rs, src/uu/seq/src/hexadecimalfloat.rs | Adjusted imports and formatting function calls to utilize the new ExtendedBigDecimal API. |
Comments suppressed due to low confidence (2)
src/uucore/src/lib/features/format/extendedbigdecimal.rs:161
- Consider verifying the operand ordering in the addition implementation for ExtendedBigDecimal. When mixing 'Nan' and 'MinusNan', the current match order might lead to inconsistent results; ensure this behavior matches the intended design.
(Self::Nan, _) => Self::Nan,
src/uucore/src/lib/features/format/mod.rs:323
- [nitpick] The introduction of the generic type parameter T (along with PhantomData) increases the complexity of Format. Consider adding documentation that clarifies the role and usage of T to help future maintainers understand the design.
formatter: F,
| ..Default::default() | ||
| } | ||
| .fmt(&mut duration_str, duration)?; | ||
| .fmt(&mut duration_str, &duration.into())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems unrelated? no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. The float formatter now takes an &ExtendedBigDecimal (and not a f64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it's a bit unfortunate and the precision isn't necessary here, but I don't think that single call warrants duplicating formatter for ExtendedBigDecimal and f64)
|
why do move it to uucore? it seems that it is only used b seq, no ? |
|
I should have made this a bit more clear, this is just the first step in many.
Correct, it was only used in The eventual idea is to start using The output is still different from GNU coreutils because:
Hope this makes sense ,-) |
|
Added 2 commits, to make full use of the new formatting function in
(spotted #7466... for another PR) |
|
GNU testsuite comparison: |
|
Did you look at performances ? :) |
|
GNU coreutils is much faster in most cases, so mostly focusing on main vs this branch: 4% worse than main on integers: ~15% worse on floats with default format: but 2+ times better on custom float formats, especially with large precision: |
|
I could make I suspect there's a lot of small optimizations we can do. But maybe in this case I'll try to look a bit more at |
|
For the second case, we can reduce the slowdown to 5% with a bypass in Happy to add those 2 small optimizations if needed. |
|
yeah, probably better |
|
Done, also filed #7482 for the most egregious performance gap. (also discovered that useful |
|
GNU testsuite comparison: |
eda0772 to
e4bf657
Compare
|
Forgot to run fmt/clippy... and didn't properly run tests... Fixed now. Just for reference, benchmarks vs main: 25% better on integers, 5% slower on default floats: |
|
GNU testsuite comparison: |
|
(I don't think those 2 failures are caused by this PR) |
d78792c to
f6fdf61
Compare
|
GNU testsuite comparison: |
f6fdf61 to
cd30627
Compare
|
Just noticed that this PR regressed uppercase hexadecimal float printing, so I added a commit here to fix that: cd30627 , and add a test so that it doesn't happen again. Rebased, too. I also opened #7514 to fix one more small issue, and add a bunch more tests (didn't want to push too much new stuff here). |
|
GNU testsuite comparison: |
Will make it possible to directly print ExtendedBigDecimal in `seq`, and gradually get rid of limited f64 precision in other tools (e.g. `printf`). Changes are mostly mechanical, we reexport ExtendedBigDecimal directly in format to keep the imports slightly shorter.
Using an associated type in Formatter trait was quite nice, but, in a follow-up change, we'd like to pass a _reference_ to the Float Formatter, while just passing i64/u64 as a value to the Int formatters. Associated type doesn't allow for that, so we turn it into a generic instead. This makes Format<> a bit more complicated though, as we need to specify both the Formatter, _and_ the type to be formatted.
Some test cases require to handle "negative" NaN. Handle it similarly to "positive" NaN.
Allows easier conversion.
Only changes the external interface, right now the number is casted back to f64 for printing. We'll update that in follow-up.
First modify Format.fmt to extract absolute value and sign, then modify printing on non-finite values (inf or nan).
Also add a few unit tests to make sure precision is not lost anymore.
No more f64 operations needed, we just trim (or extend) BigDecimal to appropriate precision, get the digits as a string, then add the decimal point. Similar to what BigDecimal::write_scientific_notation does, but we need a little bit more control.
Similar logic to scientific printing. Also add a few more tests around corner cases where we switch from decimal to scientific printing.
Display hexadecimal floats with arbitrary precision. Note that some of the logic will produce extremely large BitInt as intermediate values: there is some optimization possible here, but the current implementation appears to work fine for reasonable numbers (e.g. whatever would previously fit in a f64, and even with somewhat large precision).
`printf "%05.2f" inf` should print ` inf`, not `00inf`. Add a test to cover that case, too.
Now that uucore format functions take in an ExtendedBigDecimal, we can use those in all cases.
In most common use cases: - We can bypass a lot of `write_output` when width == 0. - Simplify format_float_decimal when the input is an integer. Also document another interesting case in src/uu/seq/BENCHMARKING.md.
Accidentally broke this use case when refactoring. Added a test as well.
cd30627 to
d678e53
Compare
|
GNU testsuite comparison: |
Okay, this one is a bit of a large PR... I can split it up if needed (e.g. everything up to
uucode: format: Change Formatter to take an &ExtendedBigDecimalis fairly mechanical). Also, I'm not too worried about having to rewrite stuff if needed (one of my purpose to contribute is to learn about Rust ,-)), so please let me know ,-)The idea here is to switch formatting functions to use ExtendedBigDecimal, instead of f64. This allows us to get arbitrary precision with custom format in
seq:Note that that doesn't match what
coreutilsoutputs on x86, since they use an 80-bit int:But that's better than before this change, where we used a
f64:At least now we have all the digits, so we could find ways to trim the floating point precision back to match 80-bit float if we wanted to.
We also take it as an opportunity to (partially) fix the hexadecimal format:
Again, doesn't fully match what coreutils does, but we could trim the precision:
Note that I'm using
seqand notprintf, as the parsing code still usesf64... There's also quite a bit of potential for this change on the parsing side: I think a lot of custom logic inseqcould then be removed.seq: Make use of uucore::format to print in all cases
Now that uucore format functions take in an ExtendedBigDecimal,
we can use those in all cases.
uucore: format: Pad non-finite numbers with spaces, not zeros
printf "%05.2f" infshould printinf, not00inf.Add a test to cover that case, too.
uucode: format: format_float_hexadecimal: Take in &BigDecimal
Display hexadecimal floats with arbitrary precision.
Note that some of the logic will produce extremely large
BitInt as intermediate values: there is some optimization
possible here, but the current implementation appears to work
fine for reasonable numbers (e.g. whatever would previously
fit in a f64, and even with somewhat large precision).
uucode: format: format_float_shortest: Take in &BigDecimal
Similar logic to scientific printing. Also add a few more tests
around corner cases where we switch from decimal to scientific
printing.
uucode: format: format_float_scientific: Take in &BigDecimal
No more f64 operations needed, we just trim (or extend) BigDecimal to
appropriate precision, get the digits as a string, then add the
decimal point.
Similar to what BigDecimal::write_scientific_notation does, but
we need a little bit more control.
uucode: format: format_float_decimal: Take in &BigDecimal
Also add a few unit tests to make sure precision is not lost anymore.
uucode: format: format_float_non_finite: Take in &ExtendedBigDecimal
First modify Format.fmt to extract absolute value and sign, then
modify printing on non-finite values (inf or nan).
uucode: format: Change Formatter to take an &ExtendedBigDecimal
Only changes the external interface, right now the number is
casted back to f64 for printing. We'll update that in follow-up.
uucore: format: extendedbigdecimal: Implement From
Allows easier conversion.
uucore: format: extendedbigdecimal: Add MinusNan
Some test cases require to handle "negative" NaN. Handle it
similarly to "positive" NaN.
uucore: format: Make Formatter a generic
Using an associated type in Formatter trait was quite nice, but, in
a follow-up change, we'd like to pass a reference to the Float
Formatter, while just passing i64/u64 as a value to the Int
formatters. Associated type doesn't allow for that, so we turn
it into a generic instead.
This makes Format<> a bit more complicated though, as we need
to specify both the Formatter, and the type to be formatted.
seq: Move extendedbigdecimal.rs to uucore/features/format
Will make it possible to directly print ExtendedBigDecimal in
seq,and gradually get rid of limited f64 precision in other tools
(e.g.
printf).Changes are mostly mechanical, we reexport ExtendedBigDecimal directly
in format to keep the imports slightly shorter.