Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tuning results #1

Open
CNugteren opened this issue May 30, 2015 · 143 comments
Open

New tuning results #1

CNugteren opened this issue May 30, 2015 · 143 comments

Comments

@CNugteren
Copy link
Owner

CNugteren commented May 30, 2015

(See the README for details)

This is the place to post new tuning results. If you compiled with -DTUNERS=ON, ran one of the tuners on your device (or all perhaps?), and feel that these results should be included in the next release of CLBlast, please post them here.

You can do this by attaching the JSON files to this issue (archived in a .ZIP file).

@tremmelg
Copy link

tremmelg commented Apr 8, 2016

Here are some tuning results from an NVIDIA Titan Black, AMD Radeon HD 7970 and an ARM Mali T-628.

Just to let you know about JSON files, GitHub says "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, PDF, or ZIP."
Archive.zip

@CNugteren
Copy link
Owner Author

Thanks for the tuning results! However, they seem to be ran with non-default settings (using specific values for alpha and beta). Could you perhaps run them again with the default settings?

By the way, the latest version already includes results for Tahiti (the HD 7970) and the ARM Mali T-628, so perhaps those are superfluous.

(I've updated the post regarding JSON-files and GitHub)

@blueberry
Copy link

Here are the results for AMD's Pitcairn (R9 270X). I'll also upload the results for Hawaii (R9 290X), but I am getting an error during Xgemm. I'll open another issue for that.
pitcairn.zip

@CNugteren
Copy link
Owner Author

Thanks! The results for Pitcairn are added to the development branch.

@blueberry
Copy link

Hawaii (AMD R9 290X):
hawaii.zip

@blueberry
Copy link

And i7 4790k:
i7-4790k.zip

@CNugteren
Copy link
Owner Author

The results for Hawaii will be added. As for the i7 results: the zip archive seems to include only a Makefile?

@blueberry
Copy link

blueberry commented May 2, 2016

Sorry, I messed up that zip. As I do not have those files any more, I'll send them when I manage to do that tuning.

@fonghou
Copy link

fonghou commented May 31, 2016

nvidia-grid-k520-aws-g2.zip

See details #61

@CNugteren
Copy link
Owner Author

@fonghou Thanks! The tuning results are added to the database. They are currently in the development branch but will be automatically included in the next release.

@OursDesCavernes
Copy link

Here are the results for the Intel i5-4210U iGPU:
Device name: 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (OpenCL 1.2 beignet 1.2 (git-1b076ec))
i5-4210U_GPU.zip

@CNugteren
Copy link
Owner Author

@OursDesCavernes Added, thanks!

@gcp
Copy link
Contributor

gcp commented Jul 1, 2016

GTX 670, GTX 750 (non-Ti), and GTX 1070 tunings attached. One of the GEMV tunings took ages (or hung) on the latter two, but curiously enough not on the (older) first card. Luckily, it looks like GEMV is the last one to be tuned so these are fairly complete anyway.

gtx670.tar.gz
gtx1070.tar.gz
gtx750.tar.gz

@CNugteren
Copy link
Owner Author

@gcp Thanks for running all the tuners on those devices! The results are added to CLBlast, currently in the development branch but they will be automatically included in the next release. Indeed, I saw long compilation times for GEMV kernels on NVIDIA as well - it is the last one to be tuned for exactly this reason. NVIDIA promises to reduce compilation times significantly with CUDA 8.0, so hopefully that also fixes these kernels.

@gcp
Copy link
Contributor

gcp commented Jul 5, 2016

Intel HD530 (desktop Skylake iGPU)
IntelHD530.zip

@CNugteren
Copy link
Owner Author

@gcp Thanks, they are added.

@CNugteren
Copy link
Owner Author

Issue #83 caused a complete re-write of the third GEMV kernel (XgemvFastRot), so I had to throw away the corresponding tuning results. If it's not too much effort, I welcome updated clblast_xgemv_fast_rot_*.json tuning results based on the development branch. The other GEMV tuning results are still valid and included in CLBlast. Thanks!

@OursDesCavernes
Copy link

Intel(R) HD Graphics 5500 BroadWell U-Processor GT2:
hd5500.zip
Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile:
hd4400.zip

@CNugteren
Copy link
Owner Author

@OursDesCavernes Thanks, HD5500 is added and HD4400 is updated.

@yingted
Copy link

yingted commented Oct 11, 2016

Intel(R) HD Graphics 4000
intel-hd4000.zip

@CNugteren
Copy link
Owner Author

@yingted Thanks! The tuning results for the IvyBridge GPU are added.

@MigMuc
Copy link

MigMuc commented Oct 22, 2016

Radeon R9 380 (Tonga) tuning results:
Tobago_TuningResults.zip

@MigMuc
Copy link

MigMuc commented Oct 22, 2016

Of course, the device is called Tonga, just a spelling mistake of the zip-file name.

@CNugteren
Copy link
Owner Author

@MigMuc The results for Tonga are added, thanks!

@matze
Copy link
Contributor

matze commented Oct 24, 2016

Here are the results for the GTX Titan Black. Unfortunately, I had the same problem as @gcp on the last run. But again, should be fairly complete.

gtx-titan-black.tar.gz

@CNugteren
Copy link
Owner Author

@matze Thanks a lot for your contribution. The tuning results are added.

@tangjinchuan
Copy link

7800XT.zip

@infinit-luffy
Copy link

4060-Daoyuan Zhu@GZU.zip

@RAN1027
Copy link

RAN1027 commented Nov 7, 2023

AMD 5600G.zip

@WaToI
Copy link

WaToI commented Dec 13, 2023

AMD 7840U Radeon 780M.zip
GPD WIN Max 2 2023

@pjuhasz
Copy link

pjuhasz commented Dec 23, 2023

Nvidia Quadro M200M
clblast_tuning_nvidia_quadro_m2000m.tar.gz

@pjuhasz
Copy link

pjuhasz commented Dec 23, 2023

Anecdotal benchmark for the previous result: running clblast-enabled whisper.cpp with the medium model (./main -m models/ggml-medium.bin ./samples/jfk.wav)

w/o tuning: 14 s
w tuning: 8.8 s

@gpokat
Copy link

gpokat commented Feb 5, 2024

Fp16 only for Helio G99 (ARM MALI G57 GPU)
helioG99_fp16_only.tar.gz
The stage 4 of clblast_tuner_xgemm tunner everytime trap to infinity loop, so I can't provide appropriate output.

@TomTheHand
Copy link

TomTheHand commented Feb 7, 2024

I have an Intel A750 and an i5-13400F (not sure if the processor matters, but running the tuners certainly occupies one of my cores). I know someone provided results for an A770 already, but hopefully these are still worthwhile. At the very least, the Intel Arc cards have more mature drivers at this point, which might make a difference.

I also wanted to ask if you would consider pushing out another release; I don't know how many changes have been made to the rest of the code, but there are a bunch of new tuning results since the last one (including the A770, so the current release has no Intel Arc results in it). I ask because I use some software that utilizes your release versions rather than compiling their own, and I'm sure others are in the same boat.

IntelA750+13400F.zip

Edit: Thank you so much for the new release! I compiled 1.6.1 and 1.6.2 with -DCLIENTS=ON and ran a few random benchmarks (I'm not sure which ones are the most important and/or most used by the software I use), and saw huge performance improvements: roughly 3x the GFLOPS for xgemm, for example, if I'm understanding correctly. Looking forward to 1.6.2 to being incorporated into more stuff so my A750 is less terrible 🤣

@gspr
Copy link
Contributor

gspr commented Feb 7, 2024

I believe these tuning results haven't yet been submitted: NVIDIA RTX A6000 (GA102GL).
A6000-tuning.tar.gz

Update: Sorry, I accidentally forgot to lock the GPU and memory clocks. New results will come shortly.

Here's the updated NVIDIA RTX A6000 GA102GL tuning (which also includes a broader set of floating point widths than the last one):
A6000-tuning-2.tar.gz

I'm not sure the GPU clock was set correctly, nvtop reported a lower value than what I set it to.

@tangjinchuan
Copy link

A770.zip
The latest Intel Arc A770 tuning results based on 31.0.101.5330 (version 2024/2/14).

@tangjinchuan
Copy link

More results are coming from my students in Artificial Intelligence.
易婉婷-2100170332-4050 Laptop 2.zip

刘杨杨-2100170317-4050 Laptop 1.zip

@SomePerson1111
Copy link

intel_i7_12700H.zip

@tangjinchuan
Copy link

@tangjinchuan
Copy link

@gitbearflying
Copy link

Does it make sense to add JSON files here which are based on a pocl device ? https://github.com/pocl/pocl
I have an old "Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz" where clinfo tells me that i have no opencl device.
After installing pocl i have one ;-)
While running "make alltuners" , should the machine have no load from other running applications to get accurate JSON files?
make alltuners is now running 9 hours and is not yet finished.

@CNugteren
Copy link
Owner Author

Does it make sense to add JSON files here which are based on a pocl device ?

Yes, why not?

While running "make alltuners" , should the machine have no load from other running applications to get accurate JSON files?

Ideally yes. But since there are many combinations it tests and often multiple are close to the optimal it doesn't harm for the end result if once a while something else happens in between.

make alltuners is now running 9 hours and is not yet finished.

Most likely most of the time is taken by the xgemm tuner? It consists of 4 parts, you could skip parts 2 and 4 and only run parts 1 and 3. You can do this by commenting out the lines that start with StartVariation<2> and StartVariation<12> in src/tuning/kernels/xgemm.cpp.

@gitbearflying
Copy link

Does it make sense to add JSON files here which are based on a pocl device ?

Yes, why not?

Because i guess an emulated opencl device(pocl cpu) can't be faster than e.g. OpenBLAS as this also is using the cpu.
Or can CLBlast make use of the built in "Mesa DRI Mobile Intel® GM45 Express Chipset" ?

make alltuners is now running 9 hours and is not yet finished.

Most likely most of the time is taken by the xgemm tuner? It consists of 4 parts, you could skip parts 2 and 4 and only run parts 1 and 3. You can do this by commenting out the lines that start with StartVariation<2> and StartVariation<12> in src/tuning/kernels/xgemm.cpp.

I'll try that, thanks!

@hajokirchhoff
Copy link

hajokirchhoff commented Jun 13, 2024 via email

@gitbearflying
Copy link

Does it make sense to add JSON files here which are based on a pocl device ?

Yes, why not?

Here it is:
Device Name cpu-penryn-Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz

intel-core-2-duo-T6670-pocl.zip

@tangjinchuan
Copy link

New results from my students in Electronic Information Engineering at Guizhou University. I keep their names to acknowledge their contributions.
石军4060 Laptop GPU.zip
2200860109 陈雍 Iris(R) Xe.zip
付子毅-2200860253 4060 Laptop GPU.zip

@tangjinchuan
Copy link

tangjinchuan commented Oct 21, 2024

@tangjinchuan
Copy link

AMD instinct MI50.tar.xz.zip

This one is quite interesting for I acquired a brand new one for 800 RMB (100 EUR approx.) from a HPC provider and found it can beat 4060Ti 16G easily for many cases except for tensor core based AI tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests