-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New tuning results #1
Comments
Here are some tuning results from an NVIDIA Titan Black, AMD Radeon HD 7970 and an ARM Mali T-628. Just to let you know about JSON files, GitHub says "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, PDF, or ZIP." |
Thanks for the tuning results! However, they seem to be ran with non-default settings (using specific values for By the way, the latest version already includes results for Tahiti (the HD 7970) and the ARM Mali T-628, so perhaps those are superfluous. (I've updated the post regarding JSON-files and GitHub) |
Here are the results for AMD's Pitcairn (R9 270X). I'll also upload the results for Hawaii (R9 290X), but I am getting an error during Xgemm. I'll open another issue for that. |
Thanks! The results for Pitcairn are added to the |
Hawaii (AMD R9 290X): |
And i7 4790k: |
The results for Hawaii will be added. As for the i7 results: the zip archive seems to include only a Makefile? |
Sorry, I messed up that zip. As I do not have those files any more, I'll send them when I manage to do that tuning. |
See details #61 |
@fonghou Thanks! The tuning results are added to the database. They are currently in the |
Here are the results for the Intel i5-4210U iGPU: |
@OursDesCavernes Added, thanks! |
GTX 670, GTX 750 (non-Ti), and GTX 1070 tunings attached. One of the GEMV tunings took ages (or hung) on the latter two, but curiously enough not on the (older) first card. Luckily, it looks like GEMV is the last one to be tuned so these are fairly complete anyway. |
@gcp Thanks for running all the tuners on those devices! The results are added to CLBlast, currently in the |
Intel HD530 (desktop Skylake iGPU) |
@gcp Thanks, they are added. |
Issue #83 caused a complete re-write of the third GEMV kernel ( |
Intel(R) HD Graphics 5500 BroadWell U-Processor GT2: |
@OursDesCavernes Thanks, HD5500 is added and HD4400 is updated. |
Intel(R) HD Graphics 4000 |
@yingted Thanks! The tuning results for the IvyBridge GPU are added. |
Radeon R9 380 (Tonga) tuning results: |
Of course, the device is called Tonga, just a spelling mistake of the zip-file name. |
@MigMuc The results for Tonga are added, thanks! |
Here are the results for the GTX Titan Black. Unfortunately, I had the same problem as @gcp on the last run. But again, should be fairly complete. |
@matze Thanks a lot for your contribution. The tuning results are added. |
AMD 7840U Radeon 780M.zip |
Nvidia Quadro M200M |
Anecdotal benchmark for the previous result: running clblast-enabled whisper.cpp with the medium model ( w/o tuning: 14 s |
Fp16 only for Helio G99 (ARM MALI G57 GPU) |
I have an Intel A750 and an i5-13400F (not sure if the processor matters, but running the tuners certainly occupies one of my cores). I know someone provided results for an A770 already, but hopefully these are still worthwhile. At the very least, the Intel Arc cards have more mature drivers at this point, which might make a difference. I also wanted to ask if you would consider pushing out another release; I don't know how many changes have been made to the rest of the code, but there are a bunch of new tuning results since the last one (including the A770, so the current release has no Intel Arc results in it). I ask because I use some software that utilizes your release versions rather than compiling their own, and I'm sure others are in the same boat. Edit: Thank you so much for the new release! I compiled 1.6.1 and 1.6.2 with -DCLIENTS=ON and ran a few random benchmarks (I'm not sure which ones are the most important and/or most used by the software I use), and saw huge performance improvements: roughly 3x the GFLOPS for xgemm, for example, if I'm understanding correctly. Looking forward to 1.6.2 to being incorporated into more stuff so my A750 is less terrible 🤣 |
Update: Sorry, I accidentally forgot to lock the GPU and memory clocks. New results will come shortly. Here's the updated NVIDIA RTX A6000 GA102GL tuning (which also includes a broader set of floating point widths than the last one): I'm not sure the GPU clock was set correctly, nvtop reported a lower value than what I set it to. |
A770.zip |
More results are coming from my students in Artificial Intelligence. |
Does it make sense to add JSON files here which are based on a pocl device ? https://github.com/pocl/pocl |
Yes, why not?
Ideally yes. But since there are many combinations it tests and often multiple are close to the optimal it doesn't harm for the end result if once a while something else happens in between.
Most likely most of the time is taken by the |
Because i guess an emulated opencl device(pocl cpu) can't be faster than e.g. OpenBLAS as this also is using the cpu.
I'll try that, thanks! |
Does it make sense to add JSON files here which are based on a
pocl device ?
Yes, why not?
Because i guess an emulated opencl device(pocl cpu) can't be faster
than e.g. OpenBLAS as this also is using the cpu.
Makes a _lot_ of sense, IMHO. Even though pocl might be slower than
OpenCLAS with the CPU, it has the advantage that you only need one
codebase (with clblast) instead of two. Our application requires OpenCL
and will not run without it. For the few customers without OpenCL, pocl
is a way to run our application even with no GPU device installed.
So I'd say: definitely yes.
|
Here it is: |
New results from my students in Electronic Information Engineering at Guizhou University. I keep their names to acknowledge their contributions. |
This one is quite interesting for I acquired a brand new one for 800 RMB (100 EUR approx.) from a HPC provider and found it can beat 4060Ti 16G easily for many cases except for tensor core based AI tasks. |
(See the README for details)
This is the place to post new tuning results. If you compiled with
-DTUNERS=ON
, ran one of the tuners on your device (or all perhaps?), and feel that these results should be included in the next release of CLBlast, please post them here.You can do this by attaching the JSON files to this issue (archived in a .ZIP file).
The text was updated successfully, but these errors were encountered: