|
| 1 | +<!-- spell-checker:ignore hhhhp armv7 cccccccccccccccccccccccccccccccdp ccccccccccccccd ccccccccccccccdp fffffffp --> |
| 2 | + |
1 | 3 | # Extensions over GNU |
2 | 4 |
|
3 | 5 | Though the main goal of the project is compatibility, uutils supports a few |
@@ -71,8 +73,84 @@ feature is adopted from [FreeBSD](https://www.freebsd.org/cgi/man.cgi?cut). |
71 | 73 | mail headers in the input. `-q`/`--quick` breaks lines more quickly. And `-T`/`--tab-width` defines the |
72 | 74 | number of spaces representing a tab when determining the line length. |
73 | 75 |
|
| 76 | +## `printf` |
| 77 | + |
| 78 | +`printf` uses arbitrary precision decimal numbers to parse and format floating point |
| 79 | +numbers. GNU coreutils uses `long double`, whose actual size may be [double precision |
| 80 | +64-bit float](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) |
| 81 | +(e.g 32-bit arm), [extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision) |
| 82 | +(x86(-64)), or |
| 83 | +[quadruple precision 128-bit float](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format) (e.g. arm64). |
| 84 | + |
| 85 | +Practically, this means that printing a number with a large precision will stay exact: |
| 86 | +``` |
| 87 | +printf "%.48f\n" 0.1 |
| 88 | +0.100000000000000000000000000000000000000000000000 << uutils on all platforms |
| 89 | +0.100000000000000000001355252715606880542509316001 << GNU coreutils on x86(-64) |
| 90 | +0.100000000000000000000000000000000004814824860968 << GNU coreutils on arm64 |
| 91 | +0.100000000000000005551115123125782702118158340454 << GNU coreutils on armv7 (32-bit) |
| 92 | +``` |
| 93 | + |
| 94 | +### Hexadecimal floats |
| 95 | + |
| 96 | +For hexadecimal float format (`%a`), POSIX only states that one hexadecimal number |
| 97 | +should be present left of the decimal point (`0xh.hhhhp±d` [1]), but does not say how |
| 98 | +many _bits_ should be included (between 1 and 4). On x86(-64), the first digit always |
| 99 | +includes 4 bits, so its value is always between `0x8` and `0xf`, while on other |
| 100 | +architectures, only 1 bit is included, so the value is always `0x1`. |
| 101 | + |
| 102 | +However, the first digit will of course be `0x0` if the number is zero. Also, |
| 103 | +rounding numbers may cause the first digit to be `0x1` on x86(-64) (e.g. |
| 104 | +`0xf.fffffffp-5` rounds to `0x1.00p-1`), or `0x2` on other architectures. |
| 105 | + |
| 106 | +We chose to replicate x86-64 behavior on all platforms. |
| 107 | + |
| 108 | +Additionally, the default precision of the hexadecimal float format (`%a` without |
| 109 | +any specifier) is expected to be "sufficient for exact representation of the value" [1]. |
| 110 | +This is not possible in uutils as we store arbitrary precision numbers that may be |
| 111 | +periodic in hexadecimal form (`0.1 = 0xc.ccc...p-7`), so we revert |
| 112 | +to the number of digits that would be required to exactly print an |
| 113 | +[extended precision 80-bit float](https://en.wikipedia.org/wiki/Extended_precision), |
| 114 | +emulating GNU coreutils behavior on x86(-64). An 80-bit float has 64 bits in its |
| 115 | +integer and fractional part, so 16 hexadecimal digits are printed in total (1 digit |
| 116 | +before the decimal point, 15 after). |
| 117 | + |
| 118 | +Practically, this means that the default hexadecimal floating point output is |
| 119 | +identical to x86(-64) GNU coreutils: |
| 120 | +``` |
| 121 | +printf "%a\n" 0.1 |
| 122 | +0xc.ccccccccccccccdp-7 << uutils on all platforms |
| 123 | +0xc.ccccccccccccccdp-7 << GNU coreutils on x86-64 |
| 124 | +0x1.999999999999999999999999999ap-4 << GNU coreutils on arm64 |
| 125 | +0x1.999999999999ap-4 << GNU coreutils on armv7 (32-bit) |
| 126 | +``` |
| 127 | + |
| 128 | +We _can_ print an arbitrary number of digits if a larger precision is requested, |
| 129 | +and the leading digit will still be in the `0x8`-`0xf` range: |
| 130 | +``` |
| 131 | +printf "%.32a\n" 0.1 |
| 132 | +0xc.cccccccccccccccccccccccccccccccdp-7 << uutils on all platforms |
| 133 | +0xc.ccccccccccccccd00000000000000000p-7 << GNU coreutils on x86-64 |
| 134 | +0x1.999999999999999999999999999a0000p-4 << GNU coreutils on arm64 |
| 135 | +0x1.999999999999a0000000000000000000p-4 << GNU coreutils on armv7 (32-bit) |
| 136 | +``` |
| 137 | + |
| 138 | +***Note: The architecture-specific behavior on non-x86(-64) platforms may change in |
| 139 | +the future.*** |
| 140 | + |
74 | 141 | ## `seq` |
75 | 142 |
|
| 143 | +Unlike GNU coreutils, `seq` always uses arbitrary precision decimal numbers, no |
| 144 | +matter the parameters (integers, decimal numbers, positive or negative increments, |
| 145 | +format specified, etc.), so its output will be more correct than GNU coreutils for |
| 146 | +some inputs (e.g. small fractional increments where GNU coreutils uses `long double`). |
| 147 | + |
| 148 | +The only limitation is that the position of the decimal point is stored in a `i64`, |
| 149 | +so values smaller than 10**(-2**63) will underflow to 0, and some values larger |
| 150 | +than 10**(2**63) may overflow to infinity. |
| 151 | + |
| 152 | +See also comments under `printf` for formatting precision and differences. |
| 153 | + |
76 | 154 | `seq` provides `-t`/`--terminator` to set the terminator character. |
77 | 155 |
|
78 | 156 | ## `ls` |
|
0 commit comments