-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow performance on Ryzen 7950X3D #7
Comments
See my tests here #6 |
Yes, I'm trying to modify GGML to make it run faster. Could you add the -v parameter to print out your System Info and Options so I can take a look? |
|
how did you build your sd ? some features here should be enabled on any platform (AVX2 on almost all x86 cpus out there) |
Installed cmake and ran the commands from the readme. |
also, your number of threads seems excessive, try reducing that to match the physical core count. |
The default only gave me around 60% utilization. But yeah I think 32 is too much. Didn't impact performance either way though. |
My compile log:
It's at 30 seconds per sampling step now.
Does that imply it failed to enable AVX/AVX2 stuff? |
no, i think thats the cuda compiler. |
or maybe not? hm what is your platform/what platform are you building for |
ran it with my built + adjusted threads to 10 (i have 12 physical)
edit: also i used q8_0 instead of q4_1 |
Well this part is definitely off:
I assume the lack of AVX is a compiler issue. But no idea how to fix that, it seems to be up to date. |
OH, tell me more about your build environment/process |
If you poke around in your build directoy, you should fine a |
Windows 10 22H2, VS 2022 with Build Tools installed, CUDA Toolkit 11.8 installed, cmake installed using their setup.
That didn't seem to change anything. I ran |
very funky, @leejet i will probably make a pr later with improved cmake (by copying from llama.cpp) |
The latest GGML code has already fixed this issue. I will rebase my code onto the latest GGML code. |
@n00mkrad the issue has been fixed. You can pull the latest code and give it a try. Don't forget to update the submodule as well.
|
Works. Still very slow, but I guess that's expected. |
Running the line from the readme, I get this:
step 1 sampling completed, taking 50.97s
Compiled with cmake on Windows. Shouldn't it be a little bit faster?
The text was updated successfully, but these errors were encountered: