Do we need this level of precision in `Q_rsqrt`? #1457

illwieckz · 2024-12-09T17:14:45Z

Here is the Q_rsqrt() code:

	inline float Q_rsqrt( float number )
	{
		float x = 0.5f * number;
		float y;

		// compute approximate inverse square root
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
		// SSE rsqrt relative error bound: 3.7 * 10^-4
		_mm_store_ss( &y, _mm_rsqrt_ss( _mm_load_ss( &number ) ) );
#else
		y = Util::bit_cast<float>( 0x5f3759df - ( Util::bit_cast<uint32_t>( number ) >> 1 ) );
		y *= ( 1.5f - ( x * y * y ) ); // initial iteration
		// relative error bound after the initial iteration: 1.8 * 10^-3
#endif
		y *= ( 1.5f - ( x * y * y ) ); // second iteration for higher precision
		return y;
	}

If I comment out the second iteration, this way;

	inline float Q_rsqrt( float number )
	{
		float x = 0.5f * number;
		float y;

		// compute approximate inverse square root
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
		// SSE rsqrt relative error bound: 3.7 * 10^-4
		_mm_store_ss( &y, _mm_rsqrt_ss( _mm_load_ss( &number ) ) );
#else
		y = Util::bit_cast<float>( 0x5f3759df - ( Util::bit_cast<uint32_t>( number ) >> 1 ) );
		y *= ( 1.5f - ( x * y * y ) ); // initial iteration
		// relative error bound after the initial iteration: 1.8 * 10^-3
#endif
//		y *= ( 1.5f - ( x * y * y ) ); // second iteration for higher precision
		return y;
	}

I jump from 8fps to 10fps (+25%) with r_VBOmodel 0 using the branch and the test layout (177 visible models) from:

Move some skeleton building to cgame, batch IPC #1386

and I see no visual difference.

The text was updated successfully, but these errors were encountered:

illwieckz · 2024-12-09T17:21:43Z

Here is the code in ioq3:

float Q_rsqrt( float number )
{
	floatint_t t;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	t.f  = number;
	t.i  = 0x5f3759df - ( t.i >> 1 );               // what the fuck?
	y  = t.f;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

	return y;
}

The second iteration is not used and it is said it can be removed.

So, I doubt we would break compatibility with Quake 3 by removing the second iteration.

Also, I wonder if it's a mistake if that second iteration is done even when not using the tricky reverse but the SSE code.

illwieckz added the T-Question label Dec 9, 2024

illwieckz mentioned this issue Dec 9, 2024

Some improvements around Q_rsqrt() #1458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need this level of precision in `Q_rsqrt`? #1457

Do we need this level of precision in `Q_rsqrt`? #1457

illwieckz commented Dec 9, 2024

illwieckz commented Dec 9, 2024 •

edited

Loading

Do we need this level of precision in Q_rsqrt? #1457

Do we need this level of precision in Q_rsqrt? #1457

Comments

illwieckz commented Dec 9, 2024

illwieckz commented Dec 9, 2024 • edited Loading

Do we need this level of precision in `Q_rsqrt`? #1457

Do we need this level of precision in `Q_rsqrt`? #1457

illwieckz commented Dec 9, 2024 •

edited

Loading