Skip to content

Commit 142e4cc

Browse files
authored
Merge pull request #7641 from drinkcat/doc-seq-printf
doc: extensions: Explain how printf/seq handle precision
2 parents 04e7de1 + 3f12ed9 commit 142e4cc

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

docs/src/extensions.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
<!-- spell-checker:ignore hhhhp armv7 cccccccccccccccccccccccccccccccdp ccccccccccccccd ccccccccccccccdp fffffffp -->
2+
13
# Extensions over GNU
24

35
Though the main goal of the project is compatibility, uutils supports a few
@@ -71,8 +73,84 @@ feature is adopted from [FreeBSD](https://www.freebsd.org/cgi/man.cgi?cut).
7173
mail headers in the input. `-q`/`--quick` breaks lines more quickly. And `-T`/`--tab-width` defines the
7274
number of spaces representing a tab when determining the line length.
7375

76+
## `printf`
77+
78+
`printf` uses arbitrary precision decimal numbers to parse and format floating point
79+
numbers. GNU coreutils uses `long double`, whose actual size may be [double precision
80+
64-bit float](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)
81+
(e.g 32-bit arm), [extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision)
82+
(x86(-64)), or
83+
[quadruple precision 128-bit float](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format) (e.g. arm64).
84+
85+
Practically, this means that printing a number with a large precision will stay exact:
86+
```
87+
printf "%.48f\n" 0.1
88+
0.100000000000000000000000000000000000000000000000 << uutils on all platforms
89+
0.100000000000000000001355252715606880542509316001 << GNU coreutils on x86(-64)
90+
0.100000000000000000000000000000000004814824860968 << GNU coreutils on arm64
91+
0.100000000000000005551115123125782702118158340454 << GNU coreutils on armv7 (32-bit)
92+
```
93+
94+
### Hexadecimal floats
95+
96+
For hexadecimal float format (`%a`), POSIX only states that one hexadecimal number
97+
should be present left of the decimal point (`0xh.hhhhp±d` [1]), but does not say how
98+
many _bits_ should be included (between 1 and 4). On x86(-64), the first digit always
99+
includes 4 bits, so its value is always between `0x8` and `0xf`, while on other
100+
architectures, only 1 bit is included, so the value is always `0x1`.
101+
102+
However, the first digit will of course be `0x0` if the number is zero. Also,
103+
rounding numbers may cause the first digit to be `0x1` on x86(-64) (e.g.
104+
`0xf.fffffffp-5` rounds to `0x1.00p-1`), or `0x2` on other architectures.
105+
106+
We chose to replicate x86-64 behavior on all platforms.
107+
108+
Additionally, the default precision of the hexadecimal float format (`%a` without
109+
any specifier) is expected to be "sufficient for exact representation of the value" [1].
110+
This is not possible in uutils as we store arbitrary precision numbers that may be
111+
periodic in hexadecimal form (`0.1 = 0xc.ccc...p-7`), so we revert
112+
to the number of digits that would be required to exactly print an
113+
[extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision),
114+
emulating GNU coreutils behavior on x86(-64). An 80-bit float has 64 bits in its
115+
integer and fractional part, so 16 hexadecimal digits are printed in total (1 digit
116+
before the decimal point, 15 after).
117+
118+
Practically, this means that the default hexadecimal floating point output is
119+
identical to x86(-64) GNU coreutils:
120+
```
121+
printf "%a\n" 0.1
122+
0xc.ccccccccccccccdp-7 << uutils on all platforms
123+
0xc.ccccccccccccccdp-7 << GNU coreutils on x86-64
124+
0x1.999999999999999999999999999ap-4 << GNU coreutils on arm64
125+
0x1.999999999999ap-4 << GNU coreutils on armv7 (32-bit)
126+
```
127+
128+
We _can_ print an arbitrary number of digits if a larger precision is requested,
129+
and the leading digit will still be in the `0x8`-`0xf` range:
130+
```
131+
printf "%.32a\n" 0.1
132+
0xc.cccccccccccccccccccccccccccccccdp-7 << uutils on all platforms
133+
0xc.ccccccccccccccd00000000000000000p-7 << GNU coreutils on x86-64
134+
0x1.999999999999999999999999999a0000p-4 << GNU coreutils on arm64
135+
0x1.999999999999a0000000000000000000p-4 << GNU coreutils on armv7 (32-bit)
136+
```
137+
138+
***Note: The architecture-specific behavior on non-x86(-64) platforms may change in
139+
the future.***
140+
74141
## `seq`
75142

143+
Unlike GNU coreutils, `seq` always uses arbitrary precision decimal numbers, no
144+
matter the parameters (integers, decimal numbers, positive or negative increments,
145+
format specified, etc.), so its output will be more correct than GNU coreutils for
146+
some inputs (e.g. small fractional increments where GNU coreutils uses `long double`).
147+
148+
The only limitation is that the position of the decimal point is stored in a `i64`,
149+
so values smaller than 10**(-2**63) will underflow to 0, and some values larger
150+
than 10**(2**63) may overflow to infinity.
151+
152+
See also comments under `printf` for formatting precision and differences.
153+
76154
`seq` provides `-t`/`--terminator` to set the terminator character.
77155

78156
## `ls`

0 commit comments

Comments
 (0)