-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src_float_to_short_array: Use division instead of shift #85
base: master
Are you sure you want to change the base?
Conversation
No it won't fix the issue and it is slower due to the integer division (and harder to understand then the solution proposed in #84 (comment)) The PowerPC problem is a broken An example: |
On modern CPUs with deep caches, branches are usually considered more expensive than division. If you can prove to me that my solution using division is slower than your solution, I will take your solution.
If PowerPC has a broken Meanwhile, I am still interested if this fixes @janstary's issue. |
Haven't we already proved that the output of
Speculative execution mitigates some of that. AND:
Prove is simple: LONG on most 64bit systems is 64bit: https://godbolt.org/z/HMv8iu |
|
I also tested on an arm machine (current OpenBSD/armv7
With the patch, it fails as
|
That's not proof, that's opinion. Proof requires a proper benchmark.
You do realize that for CPUs where it works (eg at least x86 and x86_64) the "clipping optimization" results in zero branches, don't you? |
No, you are missing my point. For Now examine what the clipping optimization is and when it is enabled:
My claim: On most architectures the clipping optimization is disabled. Quick check: Do you have a "common" 64 bit Linux on a x86_64 architecture? Then please run Proof for "most architectures":
Summary:
|
1b3d5b1
to
c10c819
Compare
@janstary I wonder if this fixes your issue on PowerPC.