Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need this level of precision in Q_rsqrt? #1457

Open
illwieckz opened this issue Dec 9, 2024 · 1 comment
Open

Do we need this level of precision in Q_rsqrt? #1457

illwieckz opened this issue Dec 9, 2024 · 1 comment

Comments

@illwieckz
Copy link
Member

Here is the Q_rsqrt() code:

	inline float Q_rsqrt( float number )
	{
		float x = 0.5f * number;
		float y;

		// compute approximate inverse square root
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
		// SSE rsqrt relative error bound: 3.7 * 10^-4
		_mm_store_ss( &y, _mm_rsqrt_ss( _mm_load_ss( &number ) ) );
#else
		y = Util::bit_cast<float>( 0x5f3759df - ( Util::bit_cast<uint32_t>( number ) >> 1 ) );
		y *= ( 1.5f - ( x * y * y ) ); // initial iteration
		// relative error bound after the initial iteration: 1.8 * 10^-3
#endif
		y *= ( 1.5f - ( x * y * y ) ); // second iteration for higher precision
		return y;
	}

If I comment out the second iteration, this way;

	inline float Q_rsqrt( float number )
	{
		float x = 0.5f * number;
		float y;

		// compute approximate inverse square root
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
		// SSE rsqrt relative error bound: 3.7 * 10^-4
		_mm_store_ss( &y, _mm_rsqrt_ss( _mm_load_ss( &number ) ) );
#else
		y = Util::bit_cast<float>( 0x5f3759df - ( Util::bit_cast<uint32_t>( number ) >> 1 ) );
		y *= ( 1.5f - ( x * y * y ) ); // initial iteration
		// relative error bound after the initial iteration: 1.8 * 10^-3
#endif
//		y *= ( 1.5f - ( x * y * y ) ); // second iteration for higher precision
		return y;
	}

I jump from 8fps to 10fps (+25%) with r_VBOmodel 0 using the branch and the test layout (177 visible models) from:

and I see no visual difference.

@illwieckz
Copy link
Member Author

illwieckz commented Dec 9, 2024

Here is the code in ioq3:

float Q_rsqrt( float number )
{
	floatint_t t;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	t.f  = number;
	t.i  = 0x5f3759df - ( t.i >> 1 );               // what the fuck?
	y  = t.f;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

	return y;
}

The second iteration is not used and it is said it can be removed.

So, I doubt we would break compatibility with Quake 3 by removing the second iteration.

Also, I wonder if it's a mistake if that second iteration is done even when not using the tricky reverse but the SSE code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant