-
Notifications
You must be signed in to change notification settings - Fork 81
NumericalCompliance
The floating point operations of the VideoCore IV QPU processors round using the IEEE 754 rount-to-zero rounding mode.
The rounding mode corresponds with the OpenCL CL_FP_ROUND_TO_ZERO
mode and is allowed as only supported rounding mode for OpenCL 1.2 embedded profiles.
[Source]
NOTE: The CPU uses the IEEE 754 round-to-nearest-even rounding mode, which means that the results for a CPU and GPU floating-point operation might not match! To be exact, the result of the GPU operation is either the same or 1 ULP closer to zero as the result of the corresponding CPU operation.
As an example, consider this calculation: 358.6662292480469 - 3.0502657890319824
:
Calculation | Rounding mode | Result | Bit-cast integer |
---|---|---|---|
"Exact" | double precision | 355.6159634590149 | |
CPU | round-to-nearest-even | 355.615966796875 | 0x43b1ced8 |
GPU | round-to-zero | 355.6159362792969 | 0x43b1ced7 |
Neither of those results match the "exact" one, but both are "the closest value that can be represented" with single-precision floating point values using the corresponding rounding modes. Using the bit-cast integer representation, the difference of exactly 1 digit (corresponding to 1 ULP in floating-point) can be seen clearly.
- Inf is supported (at least by SFU)
- NaN is not supported
If x is a real number that lies between two finite consecutive floating-point numbers a and b, without being equal to one of them, then ulp(x) = |b - a|, otherwise ulp(x) is the distance between the two non-equal finite floating-point numbers nearest x. Moreover, ulp(NaN) is NaN.
The relative ULP is 2-23 ≈ 1.19e-07 [1] for single-precision floating-point values. So e.g. nextafter(1, 2) will return 1 + 2-23 ≈ 1.000000119 [2].
Function | Allowed (in ULP) | Maximal error |
---|---|---|
x + y | correctly rounded (round-to-zero) | 0 |
x - y | correctly rounded (round-to-zero) | 0 |
x * y | correctly rounded (round-to-zero) | 0 |
1.0 / x | 3 | |
x / y | 3 | |
acos | 4 | |
acospi | 5 | |
asin | 4 | |
asinpi | 5 | |
atan | 5 | 1 |
atan2 | 6 | |
atanpi | 5 | |
atan2pi | 6 | |
acosh | 4 | |
asinh | 4 | |
atanh | 5 | |
cbrt | 4 | |
ceil | correctly rounded | 0 |
clamp | 0 | 0 |
copysign | 0 | 0 |
cos | 4 | 2 |
cosh | 4 | 2 |
cospi | 4 | |
cross | 3 | 0 |
degrees | 2 | 2 |
distance | 5.5 + 2 * len(vector) | 4 |
dot | 2 * len(vector) - 1 | 0 |
erfc | 16 | |
erf | 16 | 1 |
exp | 4 | 1 |
exp2 | 4 | |
exp10 | 4 | |
expm1 | 4 | |
fabs | 0 | 0 |
fdim | correctly rounded | |
floor | correctly rounded | 0 |
fma | correctly rounded | 0 |
fmax | 0 | 0 |
fmin | 0 | 0 |
fmod | 0 | |
fract | correctly rounded | |
frexp | 0 | |
hypot | 4 | |
ilogb | 0 | |
length | 5.5 + len(vector) | 4 |
ldexp | correctly rounded | |
log | 4 | 4 |
log2 | 4 | |
log10 | 4 | |
log1p | 4 | |
logb | 0 | |
mad | infinite | |
max | 0 | 0 |
maxmag | 0 | |
min | 0 | 0 |
minmag | 0 | |
mix | absolute error of 1e-3 | 0 |
modf | 0 | |
nan | 0 | |
nextafter | 0 | 0 |
normalize | 4.5 + len(vector) | 7 |
pow | 16 | |
pown | 16 | |
powr | 16 | |
radians | 2 | 2 |
remainder | 0 | |
remquo | 0 | |
rint | correclty rounded | 0 |
rootn | 16 | |
round | correclty rounded | 0 |
rsqrt | 4 | 1 |
sign | 0 | 0 |
sin | 4 | 1 |
sincos | 4 (both) | 2 |
sinh | 4 | 2 |
sinpi | 4 | |
smoothstep | absolute error of 1e-5 | |
sqrt | 4 | 1 |
step | 0 | 0 |
tan | 5 | |
tanh | 5 | |
tanpi | 6 | |
tgamma | 16 | |
trunc | correctly rounded | 0 |
half_cos | 8192 | |
half_divide | 8192 | 8192 |
half_exp | 8192 | 8192 |
half_exp2 | 8192 | 8192 |
half_exp10 | 8192 | 8192 |
half_log | 8192 | 8192 |
half_log2 | 8192 | 8192 |
half_log10 | 8192 | 8192 |
half_powr | 8192 | 8192 |
half_recip | 8192 | 8192 |
half_rsqrt | 8192 | 8192 |
half_sin | 8192 | |
half_sqrt | 8192 | |
half_tan | 8192 | |
fast_distance | 8192 + 2 * len(vector) | |
fast_length | 8192 + len(vector) | |
fast_normalize | 8192 + len(vector) | |
native_cos | impl.-defined | |
native_divide | impl.-defined | 8192 |
native_exp | impl.-defined | 8192 |
native_exp2 | impl.-defined | 8192 |
native_exp10 | impl.-defined | 8192 |
native_log | impl.-defined | 8192 |
native_log2 | impl.-defined | 8192 |
native_log10 | impl.-defined | 8192 |
native_powr | impl.-defined | 8192 |
native_recip | impl.-defined | 8192 |
native_rsqrt | impl.-defined | 8192 |
native_sin | impl.-defined | |
native_sqrt | impl.-defined | 8192 |
native_tan | impl.-defined |
Sources: OpenCL 1.2 FULL PROFILE OpenCL 1.2 EMBEDDED PROFILE
Calculations of ULP are done via one of the following methods:
- Plotting the difference between the original function and the approximation with kmplot
- Calculating the result for the functions with the native C implementation and the custom approximation and checking the difference ( On host only)
Currently not supported