Merge pull request #7641 from drinkcat/doc-seq-printf

sylvestre · web-flow · commit 142e4cc049a1 · 2025-05-25T22:53:12.000+02:00
doc: extensions: Explain how printf/seq handle precision
diff --git a/docs/src/extensions.md b/docs/src/extensions.md
@@ -1,3 +1,5 @@
+<!-- spell-checker:ignore hhhhp armv7 cccccccccccccccccccccccccccccccdp ccccccccccccccd ccccccccccccccdp fffffffp -->
+
 # Extensions over GNU
 
 Though the main goal of the project is compatibility, uutils supports a few
@@ -71,8 +73,84 @@ feature is adopted from [FreeBSD](https://www.freebsd.org/cgi/man.cgi?cut).
 mail headers in the input. `-q`/`--quick` breaks lines more quickly. And `-T`/`--tab-width` defines the
 number of spaces representing a tab when determining the line length.
 
+## `printf`
+
+`printf` uses arbitrary precision decimal numbers to parse and format floating point
+numbers. GNU coreutils uses `long double`, whose actual size may be [double precision
+64-bit float](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)
+(e.g 32-bit arm), [extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision)
+(x86(-64)), or
+[quadruple precision 128-bit float](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format) (e.g. arm64).
+
+Practically, this means that printing a number with a large precision will stay exact:
+```
+printf "%.48f\n" 0.1
+0.100000000000000000000000000000000000000000000000 << uutils on all platforms
+0.100000000000000000001355252715606880542509316001 << GNU coreutils on x86(-64)
+0.100000000000000000000000000000000004814824860968 << GNU coreutils on arm64
+0.100000000000000005551115123125782702118158340454 << GNU coreutils on armv7 (32-bit)
+```
+
+### Hexadecimal floats
+
+For hexadecimal float format (`%a`), POSIX only states that one hexadecimal number
+should be present left of the decimal point (`0xh.hhhhp±d` [1]), but does not say how
+many _bits_ should be included (between 1 and 4). On x86(-64), the first digit always
+includes 4 bits, so its value is always between `0x8` and `0xf`, while on other
+architectures, only 1 bit is included, so the value is always `0x1`.
+
+However, the first digit will of course be `0x0` if the number is zero. Also,
+rounding numbers may cause the first digit to be `0x1` on x86(-64) (e.g.
+`0xf.fffffffp-5` rounds to `0x1.00p-1`), or `0x2` on other architectures.
+
+We chose to replicate x86-64 behavior on all platforms.
+
+Additionally, the default precision of the hexadecimal float format (`%a` without
+any specifier) is expected to be "sufficient for exact representation of the value" [1].
+This is not possible in uutils as we store arbitrary precision numbers that may be
+periodic in hexadecimal form (`0.1 = 0xc.ccc...p-7`), so we revert
+to the number of digits that would be required to exactly print an
+[extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision),
+emulating GNU coreutils behavior on x86(-64). An 80-bit float has 64 bits in its
+integer and fractional part, so 16 hexadecimal digits are printed in total (1 digit
+before the decimal point, 15 after).
+
+Practically, this means that the default hexadecimal floating point output is
+identical to x86(-64) GNU coreutils:
+```
+printf "%a\n" 0.1
+0xc.ccccccccccccccdp-7 << uutils on all platforms
+0xc.ccccccccccccccdp-7 << GNU coreutils on x86-64
+0x1.999999999999999999999999999ap-4 << GNU coreutils on arm64
+0x1.999999999999ap-4   << GNU coreutils on armv7 (32-bit)
+```
+
+We _can_ print an arbitrary number of digits if a larger precision is requested,
+and the leading digit will still be in the `0x8`-`0xf` range:
+```
+printf "%.32a\n" 0.1
+0xc.cccccccccccccccccccccccccccccccdp-7 << uutils on all platforms
+0xc.ccccccccccccccd00000000000000000p-7 << GNU coreutils on x86-64
+0x1.999999999999999999999999999a0000p-4 << GNU coreutils on arm64
+0x1.999999999999a0000000000000000000p-4 << GNU coreutils on armv7 (32-bit)
+```
+
+***Note: The architecture-specific behavior on non-x86(-64) platforms may change in
+the future.***
+
 ## `seq`
 
+Unlike GNU coreutils, `seq` always uses arbitrary precision decimal numbers, no
+matter the parameters (integers, decimal numbers, positive or negative increments,
+format specified, etc.), so its output will be more correct than GNU coreutils for
+some inputs (e.g. small fractional increments where GNU coreutils uses `long double`).
+
+The only limitation is that the position of the decimal point is stored in a `i64`,
+so values smaller than 10**(-2**63) will underflow to 0, and some values larger
+than 10**(2**63) may overflow to infinity.
+
+See also comments under `printf` for formatting precision and differences.
+
 `seq` provides `-t`/`--terminator` to set the terminator character.
 
 ## `ls`