You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Floating point constants should be 0.0f (float) vs. 0.0 (double) for faster float ops
slow math ops like sin, cos & pow should be offloaded to lookup tables where possible
a. ) 1 version with init code to reduce binary size at the cost of startup time
b. ) another version with static const lookup tables for faster startup at the cost of size
c. ) some areas just need the math simplified for easier calculation
multiply by precalculated 1/float is faster than divide by float
some things need ops rearranged so constants can be merged and separated from variables
unwind some loops into return/initialization (less memcpy lookalikes)
functions should take pointers instead of using globals and some_func(void)
The text was updated successfully, but these errors were encountered:
static inline float Requantize_Pow_43(unsigned x) returns x^(4/3)
This could be a simplified to 16(x/8)^4/3
or 256(x/64)^4/3
Which means the lookup table could be reduced in size.
However pow(x,4.0f/3.0f) ==> cbrt((x_x)_(x*x)); to reduce the time by ~half; however,
these can be combined using a variation of the fast inverse square problem:
/* Description: returns x^(4/3) * same as cbrt((x*x)*(x*x)), but optimized for the limited cases we handle (integers 0-8209) */staticinlinefloatpow43opt2(floatx) {
if (x<2) returnx;
elsex*=x,x*=x; //pow(x,4)floata3,x2=x+x;
union {floatf; unsignedi;} u= {x};
u.i=u.i/3+0x2a517d3c; //~cbrt(x)intaccuracy_iterations=2; //reduce for speed, increase for precisionwhile (accuracy_iterations--){ //Lancaster iterationsa3=u.f*u.f*u.f;
u.f *= (a3+x2) / (a3+a3+x);
}
returnu.f;
}
Lots of room for improvement.
a. ) 1 version with init code to reduce binary size at the cost of startup time
b. ) another version with static const lookup tables for faster startup at the cost of size
c. ) some areas just need the math simplified for easier calculation
multiply by precalculated 1/float is faster than divide by float
some things need ops rearranged so constants can be merged and separated from variables
The text was updated successfully, but these errors were encountered: