Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why gcem is much slower than cmath? #45

Open
Yikai-Liao opened this issue Jun 22, 2024 · 3 comments
Open

Why gcem is much slower than cmath? #45

Yikai-Liao opened this issue Jun 22, 2024 · 3 comments

Comments

@Yikai-Liao
Copy link

Yikai-Liao commented Jun 22, 2024

I have some functions in my library that need to be called at both compile-time and runtime, and cmath has varying degrees of support for constexpr on different platforms, so I chose to use gcem.
But in using it, I found that many of gcem's functions are an order of magnitude slower than cmath under O3 optimization. I know that I can write two versions that are called at compile time and at runtime, but I'm wondering why gcem is so much slower at runtime?

1719030925962.png

I've tested this under x86 linux, windows and mac, compiling with g++, msvc and apple clang respectively, and all get roughly the same results.

@Yikai-Liao
Copy link
Author

截图_20240622142443
截图_20240622142549

I believe there is a lot of room to optimise the runtime performance of gcem. I was able to reduce the time consumption by about 40% by simply changing the recursion in the tan operation to a loop.

template<int max_depth, typename T>
constexpr
T
tan_cf_loop(const T xx)
noexcept
{
    T ans = T(2*max_depth - 1);
    for(int depth = max_depth - 1; depth > 0; --depth) {
        ans = T(2*depth - 1) - xx / ans;
    }
    return ans;
}

template<typename T>
constexpr
T
tan_cf_main(const T x)
noexcept
{
    return( (x > T(1.55) && x < T(1.60)) ? \
                tan_series_exp(x) : // deals with a singularity at tan(pi/2)
            //
            x > T(1.4) ? \
                x/tan_cf_loop<45>(x*x) :
            x > T(1)   ? \
                x/tan_cf_loop<35>(x*x) :
            // else
                x/tan_cf_loop<25>(x*x) );
}

@Yikai-Liao
Copy link
Author

Yikai-Liao commented Jun 22, 2024

And, I don't really understand why gcem uses tan(x/2) (45 iterations for the worst case) for calculating sine and cosine. Using Chebyshev polynomials to approximate sine and cosine should be a better choice.

See here: https://stackoverflow.com/a/394512/24175656

@Yikai-Liao
Copy link
Author

I have created a pull request optimised for trigonometry calculations #46
I'll try to optimize other functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant