Skip to content

NumericalCompliance

doe300 edited this page Mar 16, 2020 · 5 revisions

Rounding modes

The floating point operations of the VideoCore IV QPU processors round using the IEEE 754 rount-to-zero rounding mode. The rounding mode corresponds with the OpenCL CL_FP_ROUND_TO_ZERO mode and is allowed as only supported rounding mode for OpenCL 1.2 embedded profiles. [Source]

NOTE: The CPU uses the IEEE 754 round-to-nearest-even rounding mode, which means that the results for a CPU and GPU floating-point operation might not match! To be exact, the result of the GPU operation is either the same or 1 ULP closer to zero as the result of the corresponding CPU operation.

As an example, consider this calculation: 358.6662292480469 - 3.0502657890319824:

Calculation Rounding mode Result Bit-cast integer
"Exact" double precision 355.6159634590149
CPU round-to-nearest-even 355.615966796875 0x43b1ced8
GPU round-to-zero 355.6159362792969 0x43b1ced7

Neither of those results match the "exact" one, but both are "the closest value that can be represented" with single-precision floating point values using the corresponding rounding modes. Using the bit-cast integer representation, the difference of exactly 1 digit (corresponding to 1 ULP in floating-point) can be seen clearly.

Inf, NaN, Denormals

  • Inf is supported (at least by SFU)
  • NaN is not supported

Relative Error

If x is a real number that lies between two finite consecutive floating-point numbers a and b, without being equal to one of them, then ulp(x) = |b - a|, otherwise ulp(x) is the distance between the two non-equal finite floating-point numbers nearest x. Moreover, ulp(NaN) is NaN.

[Source]

The relative ULP is 2-23 ≈ 1.19e-07 [1] for single-precision floating-point values. So e.g. nextafter(1, 2) will return 1 + 2-23 ≈ 1.000000119 [2].

Built-in Functions

Function Allowed (in ULP) Maximal error
x + y correctly rounded (round-to-zero) 0
x - y correctly rounded (round-to-zero) 0
x * y correctly rounded (round-to-zero) 0
1.0 / x 3
x / y 3
acos 4
acospi 5
asin 4
asinpi 5
atan 5 1
atan2 6
atanpi 5
atan2pi 6
acosh 4
asinh 4
atanh 5
cbrt 4
ceil correctly rounded 0
clamp 0 0
copysign 0 0
cos 4 2
cosh 4 2
cospi 4
cross 3 0
degrees 2 2
distance 5.5 + 2 * len(vector) 4
dot 2 * len(vector) - 1 0
erfc 16
erf 16 1
exp 4 1
exp2 4
exp10 4
expm1 4
fabs 0 0
fdim correctly rounded
floor correctly rounded 0
fma correctly rounded 0
fmax 0 0
fmin 0 0
fmod 0
fract correctly rounded
frexp 0
hypot 4
ilogb 0
length 5.5 + len(vector) 4
ldexp correctly rounded
log 4 4
log2 4
log10 4
log1p 4
logb 0
mad infinite
max 0 0
maxmag 0
min 0 0
minmag 0
mix absolute error of 1e-3 0
modf 0
nan 0
nextafter 0 0
normalize 4.5 + len(vector) 7
pow 16
pown 16
powr 16
radians 2 2
remainder 0
remquo 0
rint correclty rounded 0
rootn 16
round correclty rounded 0
rsqrt 4 1
sign 0 0
sin 4 1
sincos 4 (both) 2
sinh 4 2
sinpi 4
smoothstep absolute error of 1e-5
sqrt 4 1
step 0 0
tan 5
tanh 5
tanpi 6
tgamma 16
trunc correctly rounded 0
half_cos 8192
half_divide 8192 8192
half_exp 8192 8192
half_exp2 8192 8192
half_exp10 8192 8192
half_log 8192 8192
half_log2 8192 8192
half_log10 8192 8192
half_powr 8192 8192
half_recip 8192 8192
half_rsqrt 8192 8192
half_sin 8192
half_sqrt 8192
half_tan 8192
fast_distance 8192 + 2 * len(vector)
fast_length 8192 + len(vector)
fast_normalize 8192 + len(vector)
native_cos impl.-defined
native_divide impl.-defined 8192
native_exp impl.-defined 8192
native_exp2 impl.-defined 8192
native_exp10 impl.-defined 8192
native_log impl.-defined 8192
native_log2 impl.-defined 8192
native_log10 impl.-defined 8192
native_powr impl.-defined 8192
native_recip impl.-defined 8192
native_rsqrt impl.-defined 8192
native_sin impl.-defined
native_sqrt impl.-defined 8192
native_tan impl.-defined

Sources: OpenCL 1.2 FULL PROFILE OpenCL 1.2 EMBEDDED PROFILE

Calculations of ULP are done via one of the following methods:

  • Plotting the difference between the original function and the approximation with kmplot
  • Calculating the result for the functions with the native C implementation and the custom approximation and checking the difference ( On host only)

Edge case behavior

Currently not supported

Clone this wiki locally