-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing against other blas version #19
Comments
Ok i found it, i have to change the path of Linking in the directory of test_problems and Makefile, i also have to mark BLASFEO_TESTING = 1 but then i am getting this kind of error, [ 84%] Built target blasfeo |
This build error should be fixed with bf6f17d . Could you check again please? |
Ok thx i am going go check it, |
Afterewards i am getting this kind of error and [ 96%] Linking C executable d_blas I am guess in link.txt -lopenblas is missing, because it should be: /usr/bin/cc -O2 -fPIC -DLA=HIGH_PERFORMANCE -DTARGET=ARMV8A_ARM_CORTEX_A57 -DLA_HIGH_PERFORMANCE -DEXT_DEP -DOS_LINUX -DREF_BLAS_OPENBLAS -I/opt/openblas/include -DTARGET_ARMV8A_ARM_CORTEX_A57 -march=armv8-a+crc+crypto+fp+simd CMakeFiles/d_blas.dir/test_d_blas.c.o -o d_blas -rdynamic ../libblasfeo.a -lm -lopenblas , then it is possible to compile. Now if i do the test im getting this kind of output, BLAS performance test - float precision Frequency used to compute theoretical peak: 3.3 GHz (edit test_param.h to modify this value). Testing BLAS version for VFPv4 instruction set, 32 bit (optimized for ARM Cortex A15): theoretical peak 26.4 Gflops n sgemm_blasfeo sgemm_blas n Gflops % Gflops % 4 0.22 0.83 inf inf Best regards. |
I guess it should be possible to change this kind of lines ifeq ($(REF_BLAS), OPENBLAS) ifeq ($(REF_BLAS), BLIS) ifeq ($(REF_BLAS), NETLIB) ifeq ($(REF_BLAS), MKL) ifeq ($(REF_BLAS), ATLAS) in Makefile from test_problems Best regards. |
I know that at now the distinction is very blurry but In any case you are right the If you clone that branch then you can run i.e. It would be great if you can test this in your system. |
Not really, the best thing would be to control most of the variables from cmake or cmake-gui. So last bugs are fixed, but if i do so then i am getting this kind of output: sudo cmake -DBLASFEO_BENCHMARKS=ON -DREF_BLAS=OPENBLAS
Best regards. |
Hi, but did you pull my branch?
I also tested on a ARM core (A53) against OpenBlas and it is working. |
Ok here we go: BLAS performance test - double precision Frequency used to compute theoretical peak: 3.3 GHz (edit test_param.h to modify this value). Testing BLAS version for NEONv2 instruction set, 64 bit (optimized for ARM Cortex A57): theoretical peak 13.2 Gflops n dgemm_blasfeo dgemm_blas n Gflops % Gflops 4 0.07 0.54 0.02 0.17 128 2.34 17.72 0.98 7.39 I guess there is still something wrong, because this test was done on Jetson TX2, it has about 1.5 Flops for single precision, so it should be about the only the half. https://www.aetina.com/products-detail.php?i=210 |
Second test for Nvidia Jetson TX2 BLAS performance test - float precision Frequency used to compute theoretical peak: 3.3 GHz (edit test_param.h to modify this value). Testing BLAS version for VFPv4 instruction set, 32 bit (optimized for ARM Cortex A15): theoretical peak 26.4 Gflops n sgemm_blasfeo sgemm_blas n Gflops % Gflops % 4 0.22 0.83 0.05 0.19 |
Best regards and thank you. |
Hey, first of all, which cores of the TX2 are you running on? ARM Cortex A57 or Denver? If Denver, the code is not optimized for that, I have no clue what the architecture is. Then, you need to set by hand the frequency of the processor, to get meaningful percentages w.r.t. theoretical maximum (e.g. it should be 2.0 GHz for the A57), this is done in the file test_param.h as reported in your print out above. Also, you need to choose by hand the routine you want to benchmark and the relative number of flops. Last point, the A57 @2.0 GHz has 8 (16) Gflops in double (single) precision respectively. |
Please also note that, in case of the ARM Cortex A57 target in BLASFEO, not all routines have already been optimized. E.g. dgemm_nt is fully optimized, but dgemm_nn is not, and it is simply a fallback to the GENERIC target. You can check out the source code in the folder kernels/armv8a to see which kernels have already been optimized in assembly for the target architecture. |
Could you please specify the MKL version in your tests? |
In the make build system (which is the recommended one), you can specify the path to the installation folder of your chosen MKL version here https://github.com/giaf/blasfeo/blob/master/Makefile.external_blas#L56 When you choose MKL as external BLAS, the |
I was talking about the performance graphs in the project website. By the way, amazing to see how good the performance are. Bravo! |
MKL is version 2019.1.144. The other BLAS implementations are about form the same time. @tmmsartor we should add all BLAS version in there. |
Could the performance of |
Hello,
i am missing the point to test against other libraries like openblas, because where should i add the according references for example in cmake.
best regards
The text was updated successfully, but these errors were encountered: