Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Libm] Use sqrt intrinsic function in xsqrt if available #169

Merged
merged 5 commits into from
Feb 9, 2018

Conversation

shibatch
Copy link
Owner

@shibatch shibatch commented Feb 7, 2018

With this patch, sqrt functions utilizes an instruction for computing sqrt if available.
This patch also adds -fno-math-errno and -fno-trapping-math compiler options when compiled by gcc or clang.

@fpetrogalli
Copy link

This is very good. Thanks for implementing it. Please add a comment in the cmake file explaining why you needed to add the -fno-math-errno and -fno-trapping-math compiler options.

@@ -1818,6 +1818,9 @@ EXPORT CONST vfloat xfmaf(vfloat x, vfloat y, vfloat z) {
static INLINE CONST vint2 vcast_vi2_i_i(int i0, int i1) { return vcast_vi2_vm(vcast_vm_i_i(i0, i1)); }

EXPORT CONST vfloat xsqrtf_u05(vfloat d) {
#ifdef ACCURATE_SQRT
return vsqrt_vf_vf(d);
#else

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would terminate the #ifdef here. the rest of the function is being removed by the compiler anyway if the macro is on.

@@ -1846,6 +1849,7 @@ EXPORT CONST vfloat xsqrtf_u05(vfloat d) {
x = vsel_vf_vo_vf_vf(veq_vo_vf_vf(d, vcast_vf_f(0)), d, x);

return x;
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#endif /* ACCURATE_SQRT */

@@ -2407,6 +2407,9 @@ EXPORT CONST vdouble xfma(vdouble x, vdouble y, vdouble z) {
}

EXPORT CONST vdouble xsqrt_u05(vdouble d) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for the single precision version

@@ -2407,6 +2407,9 @@ EXPORT CONST vdouble xfma(vdouble x, vdouble y, vdouble z) {
}

EXPORT CONST vdouble xsqrt_u05(vdouble d) {
#ifdef ACCURATE_SQRT
return vsqrt_vd_vd(d);
#else

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an helper file that does not define ACCURATE_SQRT? If not, we should remove the code in this branch, as it is not used.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep the code since I want to claim that all the functions in the C99 standard are implemented.

Copy link

@fpetrogalli fpetrogalli Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep the code since I want to claim that all the functions in the C99 standard are implemented

OK, but we need to make sure we are testing the polynomial approximation. I think you should define a new function then.

xsqrt: uses the instruction
xsqrt_05: uses the polynomial approximation.

Both functions need to be tested with the same tests.

The xsqrt one is the one that needs to be exported as _ZGV*_sqrt when ACCURATE_SQRT is defined.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how should those functions be implemented if sqrt instruction is not available?
You know, aarch32 does not have a vector sqrt instruction. Availability of vector sqrt operation is not assumed in helpervecext.h or helperpurec.h.

The style in sleef is basically to implement everything with software, but if some unique feature is available in an extension, use it. In this case, sqrt instruction is viewed as a feature of vector extension.

@@ -2407,6 +2407,10 @@ EXPORT CONST vdouble xfma(vdouble x, vdouble y, vdouble z) {
}

EXPORT CONST vdouble xsqrt_u05(vdouble d) {
#ifdef ACCURATE_SQRT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to make sure that the polynomial approximation gets tested.

How about this:

vdouble xsqrt(vdouble x) {
#ifdef ACCURATE_SQRT
return vsqrt_vd_vd(x);
#endif
// fall back to polynomial approximation if ACCURATE_SQRT is undefined
return xsqrt_05(x);
}

vdouble xsqrt_05(vdouble x) {
  // polynomial implementation
}

// test  both xsqrt and xsqrt_05 in tester
// export only xsqrt in sleef.h

This should suppress timeout at travis.
@@ -2751,6 +2761,33 @@ EXPORT CONST void *xgetPtr(int name) {
#include ALIAS_NO_EXT_SUFFIX
#endif

#ifdef ENABLE_GNUABI

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this back to the original position? It is really hard to tell what you have changed here.

@shibatch shibatch merged commit 33ffaff into master Feb 9, 2018
@shibatch shibatch deleted the Use_sqrt_intrinsics_for_xsqrt_if_available branch February 9, 2018 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants