StandardLibrary

Status of the support of the OpenCL Standard Library

Source: OpenCL 1.2 Reference Pages, section OpenCL Compiler -> Built-in Functions.

Async Copy and Prefetch Functions

Function	Implementation	Remarks
async_work_group_copy	read/write of DMA block	only executed by the first work-item
async_work_group_strided_copy	per-item copy	only executed by the first work-item
wait_group_events	barrier	other work-items wait for the first one to finish
prefetch	no-op

Common Functions

Function	Implementation	Remarks
clamp	fmin(fmax(a,min), max)
degrees	fmul(a, 180/PI)
max	fmax(a, b)
min	fmin(a, b)
mix	fadd(x, fmul(fsub(y, x), a))
radians	fmul(a, PI/180)
sign	a > 0 ? 1.0 : a <= 0 ? -1.0 : 0.0
smoothstep	see reference	check for improvement
step	a < edge ? 0.0 : 1.0

Explicit Memory Fence Functions

Function	Implementation	Remarks
mem_fence	no-op	memory-access will already be committed immediately
read_mem_fence	no-op	memory-access will already be committed immediately
write_mem_fence	no-op	memory-access will already be committed immediately

Geometric Functions

Function	Implementation	Remarks
cross	a x b
dot	a · b
distance	length(fsub(a,b))
length	sqrt(dot(a,a))
normalize	fdiv(a, sqrt(dot(a,a)))
fast_distance	fast_length(fsub(a,b))
fast_length	half_sqrt(dot(a,a))
fast_normalize	fmul(a, half_rsqrt(dot(a,a)))

Image Functions

Function	Implementation	Remarks
read_imagef	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array, with and without sampler
read_imagei	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array, with and without sampler
read_imageui	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array, with and without sampler
write_imagef	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array
write_imagei	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array
write_imageui	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array
get_image_width	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array
get_image_height	not supported	all versions, int and float coordinates, 1D, 2D and 3D, single and array
get_image_depth	not supported
get_image_channel_data_type	not supported	all versions
get_image_channel_order	not supported	all versions
get_image_dim	not supported	all versions
get_image_array_size	not supported	all versions

Integer Functions

Function	Implementation	Remarks
abs	max(a, -a)
abs_diff	see pocl
add_sat	clamp(add(a, b), type_min, type_max)
clamp	min(max(a, min), max)
clz	clz(a)
hadd	add(shr(add(a,b),1), carry-bit)
mad24	add(mul24(a, b), c)
mad_hi	add(mul_hi(a, b), c)
mad_sat	clamp(add(mul(a, b), c), t_min, t_max)
max	max(a, b)	only signed
min	min(a, b)	only signed
mul24	mul24(a, b)
mul_hi	via mul24
popcount	Brian Kernighan's way	see source
rhadd	add(shr(add(a,b,1),1), carry-bit)
rotate	ror(a, sub(32, b))
sub_sat	clamp(sub(a, b), type_min, type_max)
upsample	or(shl(hi, width), lo)

Math Functions

Function	Implementation	Remarks
acos	fsub(M_PI/2.0f, asin(a))
acosh	via logarithm
acospi	acos(fmul(a, M_PI))
asin	Taylor-series
asinh	Taylor-series
asinpi	asin(fmul(a, M_PI))
atan	Taylor-series
atan2	atan(fdiv(a,b))
atanh	Taylor-series
atanpi	atan(fmul(a, M_PI))
atan2pi	atanpi(fdiv(a,b))
cbrt	pow(a, 1/3f)
ceil
copysign	and(a, or(b, NAN))
cos	Taylor-series	only for -PI to PI
cosh	Taylor-series
cospi	cos(fmul(a, M_PI))
half_cos	cos(a)
native_cos	cos(a)
erf	numeric approximation	see souce
erfc	fsub(1, erf(a))
exp	exp2(fmul(a, M_LOG2E))
half_exp	native_exp(a)	SFU seems to be exact enough
native_exp	native_exp2(fmul(a, M_LOG2E))
exp2	exp10(fmul(a, M_LN2F))
half_exp2	native_exp2(a)	SFU seems to be exact enough
native_exp2	SFU_EXP2(a)
exp10	exp2(fmul(a, M_LOG210))
half_exp10	native_exp10(a)	SFU seems to be exact enough
native_exp10	native_exp2(fmul(a, M_LOG210))
expm1	fsub(exp(a), 1)
fabs	fmaxabs(a, a)
fdim	a>b ? fsub(a,b) : 0
floor
fma	fadd(fmul(a, b), c)	does not heed rounding
fmax	fmax(a, b)
fmin	fmin(a, b)
fmod	see reference	fsub(x, fmul(y, trunc(fdiv(x,y))))
fract	see reference	fmin(a - floor(a), 0x1.fffffep - 1f ), b = floor(a)*
frexp	via ilogb and ldexp
hypot	sqrt(fadd(fmul(a, a), fmul(b, b)))	no special over-/underflow handling
ilogb	and(shr(a, 23), 0xFF)
ldexp	fmul(a, i2f(shl(1, and(b, 31))))	a 2 ^ b*
lgamma		see Numerical Recipes in C, chapter 6.1
lgamma_r	via lgamma
log	fmul(log2(a), 1.0/M_LOG2E)
half_log	native_log(a)	SFU seems to be exact enough
native_log	fmul(native_log2(a), 1/M_LOG2E)
log2	iterative approximation
half_log2	native_log2(a)	SFU seems to be exact enough
native_log2	SFU_LOG2(a)
log10	fmul(log2(a), 1/M_LOG210)
half_log10	native_log10(a)	SFU seems to be exact enough
native_log10	fmul(native_log2(a), 1/M_LOG210)
log1p	log(fadd(1,a))
logb	itof(ilogb(a))
mad	fadd(fmul(a, b), c)
maxmag	see reference
minmag	see reference
modf	sub(a, trunc(a), trunc(a)
nan	and(NaN, a)
nextafter	bitcast_float(bitcast_int(a) + 1))
pow	via powr
pown	fast power	very inefficient for vector-types!
powr	exp(fmul(b, log(a)))
half_powr	powr(a,b)
native_powr	native_exp2(fmul(b, native_log2(a)))
half_recip	native_recip(a)	SFU seems to be exact enough
native_recip	SFU_RECIP(a)
remainder	sub(a, mul(rint(div(a, b)), b))
remquo	via rint and bit-twiddling
rint
round
rootn	pow(a, fdiv(1, itof(b)))	Newton-Verfahren
rsqrt	fdiv(1, sqrt(a))
half_rsqrt	native_rsqrt(a)	SFU seems to be exact enough
native_rsqrt	SFU_RSQRT(a)
sin	Taylor-series	only for -PI to PI
half_sin	sin(a)
native_sin	sin(a)
sincos	(*b = cos(a), sin(a))
sinh	Taylor-series
sinpi	sin(fmul(a, M_PI))
sqrt	Taylor-series
half_sqrt	sqrt(a)
native_sqrt	SFU_RECIP(SFU_RSQRT(a))
tan	Taylor-series
half_tan	tan(a)
native_tan	tan(a)
tanh	Taylor-series
tanpi	tan(fmul(a, M_PI))
tgamma	exp(lgamma(a))
trunc

Misc. Vector Functions

Function	Implementation	Remarks
shuffle	shuffle2(a, a, b)
shuffle2	via intrinsics	uses vector-rotations
vec_step	compiler-intrinsic

Relational Functions

Function	Implementation	Remarks
isequal	xor(a, b) == 0	true = 1 for scalar and -1 for vectors
isnotequal	xor(a, b) != 0	true = 1 for scalar and -1 for vectors
isgreater	xor(fmin(a, b), a) != 0	true = 1 for scalar and -1 for vectors
isgreaterequal	xor(fmax(a, b), a) == 0	true = 1 for scalar and -1 for vectors
isless	xor(fmax(a, b), a) != 0	true = 1 for scalar and -1 for vectors
islessequal	xor(fmin(a, b), a) == 0	true = 1 for scalar and -1 for vectors
islessgreater	or((x < y), (x > y))	true = 1 for scalar and -1 for vectors
isfinite	xor(and(a, INF), INF) != 0	true = 1 for scalar and -1 for vectors
isinf	xor(and(a, INF), INF) == 0	true = 1 for scalar and -1 for vectors
isnan	xor(and(a, NAN), NAN) == 0	true = 1 for scalar and -1 for vectors
isnormal	check for denormal value
isordered	isequal(a, a) && isequal(b, b)
isunordered	isnan(a) \|\| isnan(b)
signbit	shr(a, 31) / asr(a, 31)	arithmetic shift for vector-type
any	see reference
all	see reference
bitselect	or(and(not(c), a), and(c, b))
select	msb(c) ? b : a

Synchronization Function

Function	Implementation	Remarks
barrier		via semaphores per work-item

Vector Data Load and Store Functions

Function	Implementation	Remarks
vloadn	load(add(b, mul(a, vector-size))	(a * vector-count + b)*
vload_half	not supported
vload_halfn	not supported
vloada_halfn	not supported
vstoren	store(add(c, mul(b, vector-size), a)	(b vector-count + c) = a
vstore_half	not supported
vstore_halfn	not supported
vstorea_halfn	not supported

Work-Item Functions

Function	Implementation	Remarks
get_global_id	register-read	passed via UNIFORM
get_global_size	register-read	passed via UNIFORM
get_global_offset	register-read	passed via UNIFORM
get_local_id	register-read	passed via UNIFORM
get_local_size	register-read	passed via UNIFORM
get_num_groups	register-read	passed via UNIFORM
get_group_id	register-read	passed via UNIFORM
get_work_dim	register-read	passed via UNIFORM

Atomic Functions

Function	Implementation	Remarks
atomic_add	old = p, p = add(old, val)	enclosed in global mutex-lock
atomic_sub	old = p, p = sub(old, val)	enclosed in global mutex-lock
atomic_xchg	old = p, p = val	enclosed in global mutex-lock
atomic_inc	old = p, p = add(old, 1)	enclosed in global mutex-lock
atomic_dec	old = p, p = sub(old, 1)	enclosed in global mutex-lock
atomic_cmpxchg	old = p, tmp = xor(old,cmp), p = val (if 0)	enclosed in global mutex-lock
atomic_min	old = p, p = min(old, val)	enclosed in global mutex-lock
atomic_max	old = p, p = max(old, val)	enclosed in global mutex-lock
atomic_and	old = p, p = and(old, val)	enclosed in global mutex-lock
atomic_or	old = p, p = or(old, val)	enclosed in global mutex-lock
atomic_xor	old = p, p = xor(old, val)	enclosed in global mutex-lock

printF

TODO OpenCL 1.2 pages 286+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StandardLibrary

Status of the support of the OpenCL Standard Library

Async Copy and Prefetch Functions

Common Functions

Explicit Memory Fence Functions

Geometric Functions

Image Functions

Integer Functions

Math Functions

Misc. Vector Functions

Relational Functions

Synchronization Function

Vector Data Load and Store Functions

Work-Item Functions

Atomic Functions

printF

Clone this wiki locally