-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Libm] Use sqrt intrinsic function in xsqrt if available #169
Conversation
This is very good. Thanks for implementing it. Please add a comment in the cmake file explaining why you needed to add the -fno-math-errno and -fno-trapping-math compiler options. |
src/libm/sleefsimdsp.c
Outdated
@@ -1818,6 +1818,9 @@ EXPORT CONST vfloat xfmaf(vfloat x, vfloat y, vfloat z) { | |||
static INLINE CONST vint2 vcast_vi2_i_i(int i0, int i1) { return vcast_vi2_vm(vcast_vm_i_i(i0, i1)); } | |||
|
|||
EXPORT CONST vfloat xsqrtf_u05(vfloat d) { | |||
#ifdef ACCURATE_SQRT | |||
return vsqrt_vf_vf(d); | |||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would terminate the #ifdef here. the rest of the function is being removed by the compiler anyway if the macro is on.
src/libm/sleefsimdsp.c
Outdated
@@ -1846,6 +1849,7 @@ EXPORT CONST vfloat xsqrtf_u05(vfloat d) { | |||
x = vsel_vf_vo_vf_vf(veq_vo_vf_vf(d, vcast_vf_f(0)), d, x); | |||
|
|||
return x; | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif /* ACCURATE_SQRT */
src/libm/sleefsimddp.c
Outdated
@@ -2407,6 +2407,9 @@ EXPORT CONST vdouble xfma(vdouble x, vdouble y, vdouble z) { | |||
} | |||
|
|||
EXPORT CONST vdouble xsqrt_u05(vdouble d) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as for the single precision version
src/libm/sleefsimddp.c
Outdated
@@ -2407,6 +2407,9 @@ EXPORT CONST vdouble xfma(vdouble x, vdouble y, vdouble z) { | |||
} | |||
|
|||
EXPORT CONST vdouble xsqrt_u05(vdouble d) { | |||
#ifdef ACCURATE_SQRT | |||
return vsqrt_vd_vd(d); | |||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an helper file that does not define ACCURATE_SQRT? If not, we should remove the code in this branch, as it is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep the code since I want to claim that all the functions in the C99 standard are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep the code since I want to claim that all the functions in the C99 standard are implemented
OK, but we need to make sure we are testing the polynomial approximation. I think you should define a new function then.
xsqrt: uses the instruction
xsqrt_05: uses the polynomial approximation.
Both functions need to be tested with the same tests.
The xsqrt one is the one that needs to be exported as _ZGV*_sqrt when ACCURATE_SQRT
is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how should those functions be implemented if sqrt instruction is not available?
You know, aarch32 does not have a vector sqrt instruction. Availability of vector sqrt operation is not assumed in helpervecext.h or helperpurec.h.
The style in sleef is basically to implement everything with software, but if some unique feature is available in an extension, use it. In this case, sqrt instruction is viewed as a feature of vector extension.
src/libm/sleefsimddp.c
Outdated
@@ -2407,6 +2407,10 @@ EXPORT CONST vdouble xfma(vdouble x, vdouble y, vdouble z) { | |||
} | |||
|
|||
EXPORT CONST vdouble xsqrt_u05(vdouble d) { | |||
#ifdef ACCURATE_SQRT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to make sure that the polynomial approximation gets tested.
How about this:
vdouble xsqrt(vdouble x) {
#ifdef ACCURATE_SQRT
return vsqrt_vd_vd(x);
#endif
// fall back to polynomial approximation if ACCURATE_SQRT is undefined
return xsqrt_05(x);
}
vdouble xsqrt_05(vdouble x) {
// polynomial implementation
}
// test both xsqrt and xsqrt_05 in tester
// export only xsqrt in sleef.h
This should suppress timeout at travis.
src/libm/sleefsimddp.c
Outdated
@@ -2751,6 +2761,33 @@ EXPORT CONST void *xgetPtr(int name) { | |||
#include ALIAS_NO_EXT_SUFFIX | |||
#endif | |||
|
|||
#ifdef ENABLE_GNUABI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move this back to the original position? It is really hard to tell what you have changed here.
With this patch, sqrt functions utilizes an instruction for computing sqrt if available.
This patch also adds -fno-math-errno and -fno-trapping-math compiler options when compiled by gcc or clang.