NumericalCompliance

Rounding modes

The floating point operations of the VideoCore IV QPU processors round using the IEEE 754 rount-to-zero rounding mode. The rounding mode corresponds with the OpenCL CL_FP_ROUND_TO_ZERO mode and is allowed as only supported rounding mode for OpenCL 1.2 embedded profiles. [Source]

NOTE: The CPU uses the IEEE 754 round-to-nearest-even rounding mode, which means that the results for a CPU and GPU floating-point operation might not match! To be exact, the result of the GPU operation is either the same or 1 ULP closer to zero as the result of the corresponding CPU operation.

As an example, consider this calculation: 358.6662292480469 - 3.0502657890319824:

Calculation	Rounding mode	Result	Bit-cast integer
"Exact"	double precision	355.6159634590149
CPU	round-to-nearest-even	355.615966796875	0x43b1ced8
GPU	round-to-zero	355.6159362792969	0x43b1ced7

Neither of those results match the "exact" one, but both are "the closest value that can be represented" with single-precision floating point values using the corresponding rounding modes. Using the bit-cast integer representation, the difference of exactly 1 digit (corresponding to 1 ULP in floating-point) can be seen clearly.

Inf, NaN, Denormals

Inf is supported (at least by SFU)
NaN is not supported

Relative Error

If x is a real number that lies between two finite consecutive floating-point numbers a and b, without being equal to one of them, then ulp(x) = |b - a|, otherwise ulp(x) is the distance between the two non-equal finite floating-point numbers nearest x. Moreover, ulp(NaN) is NaN.

[Source]

The relative ULP is 2^-23 ≈ 1.19e^-07 [1] for single-precision floating-point values. So e.g. nextafter(1, 2) will return 1 + 2^-23 ≈ 1.000000119 [2].

Built-in Functions

Function	Allowed (in ULP)	Maximal error
x + y	correctly rounded (round-to-zero)	0
x - y	correctly rounded (round-to-zero)	0
x * y	correctly rounded (round-to-zero)	0
1.0 / x	3
x / y	3
acos	4
acospi	5
asin	4
asinpi	5
atan	5	1
atan2	6
atanpi	5
atan2pi	6
acosh	4
asinh	4
atanh	5
cbrt	4
ceil	correctly rounded	0
clamp	0	0
copysign	0	0
cos	4	2
cosh	4	2
cospi	4
cross	3	0
degrees	2	2
distance	5.5 + 2 * len(vector)	4
dot	2 * len(vector) - 1	0
erfc	16
erf	16	1
exp	4	1
exp2	4
exp10	4
expm1	4
fabs	0	0
fdim	correctly rounded
floor	correctly rounded	0
fma	correctly rounded	0
fmax	0	0
fmin	0	0
fmod	0
fract	correctly rounded
frexp	0
hypot	4
ilogb	0
length	5.5 + len(vector)	4
ldexp	correctly rounded
log	4	4
log2	4
log10	4
log1p	4
logb	0
mad	infinite
max	0	0
maxmag	0
min	0	0
minmag	0
mix	absolute error of 1e-3	0
modf	0
nan	0
nextafter	0	0
normalize	4.5 + len(vector)	7
pow	16
pown	16
powr	16
radians	2	2
remainder	0
remquo	0
rint	correclty rounded	0
rootn	16
round	correclty rounded	0
rsqrt	4	1
sign	0	0
sin	4	1
sincos	4 (both)	2
sinh	4	2
sinpi	4
smoothstep	absolute error of 1e-5
sqrt	4	1
step	0	0
tan	5
tanh	5
tanpi	6
tgamma	16
trunc	correctly rounded	0
half_cos	8192
half_divide	8192	8192
half_exp	8192	8192
half_exp2	8192	8192
half_exp10	8192	8192
half_log	8192	8192
half_log2	8192	8192
half_log10	8192	8192
half_powr	8192	8192
half_recip	8192	8192
half_rsqrt	8192	8192
half_sin	8192
half_sqrt	8192
half_tan	8192
fast_distance	8192 + 2 * len(vector)
fast_length	8192 + len(vector)
fast_normalize	8192 + len(vector)
native_cos	impl.-defined
native_divide	impl.-defined	8192
native_exp	impl.-defined	8192
native_exp2	impl.-defined	8192
native_exp10	impl.-defined	8192
native_log	impl.-defined	8192
native_log2	impl.-defined	8192
native_log10	impl.-defined	8192
native_powr	impl.-defined	8192
native_recip	impl.-defined	8192
native_rsqrt	impl.-defined	8192
native_sin	impl.-defined
native_sqrt	impl.-defined	8192
native_tan	impl.-defined

Sources: OpenCL 1.2 FULL PROFILE OpenCL 1.2 EMBEDDED PROFILE

Calculations of ULP are done via one of the following methods:

Plotting the difference between the original function and the approximation with kmplot
Calculating the result for the functions with the native C implementation and the custom approximation and checking the difference ( On host only)

Edge case behavior

Currently not supported

Provide feedback

Saved searches