-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use -march=native -mtune=native on x86 (Also Enables AVX512 on macOS) #609
Conversation
That's a good question. I cannot see any disadvantages on my system, and it significantly simplifies the Makefile. |
@cmdrf would you be able to do a little performance comparison with and without these changes on your system? I'd be very curious to see what kind of benefit you see. We've seen rather small improvements (~10% or so at best) on desktop systems with AMD and Intel CPUs, and I'd be interested to see if those benefits translate through to a laptop. |
Sure, here you go! Although it's also a desktop system (Xeon W-3223):
Without AVX512:
With AVX512:
Which translates to an improvement of 3.7%, taking the total time into account. Btw: I passed in the same RNG seed in both runs and was expecting to get the same output, but it was different. |
That is great to see! I've heard stories about AVX512 on certain Intel CPUs hurting performance due to throttling or similar, and I'm happy to see that's not the case for this particular application. Tyvm for benchmarking |
I did the same benchmark with One interesting thing I noticed: The output was now exactly the same as with my other AVX512 run. This implies that the output is not only dependent on the prompt and the RNG seed, but also on AVX512 being enabled or not. Could this potentially be a bug in the hand-crafted routines for AVX512? Or is this to be expected? |
It is expected, not every path uses the exact same floating point operations or in the same order and that may result in slightly different results. As long as the generation quality (as calculated by the perplexity) isn't affected this is not an issue. |
There is no good reason for not using it. So yes, let's start using Regarding different results - correct. |
On my 2019 Mac Pro I have these CPU features:
Although I was wondering: Why not use
-march=native
?EDIT:
Using
-march=native -mtune=native
on x86 now. Could potentially be extended to other architectures, although the meaning of-march
,-mtune
and-mcpu
is a bit convoluted across different architectures.