-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plan for 0.3.0 #1245
Comments
I propose to remove the following non-SSE targets:
|
Avoid all 3dnow replacing with MMX due to lack of current cpus for testing? |
I'm finding so far on my Kaby Lake machine that 0.2.20 seems a bit slower and certainly no faster than 0.2.19 compiled for the Haswell architecture. Optimizing for the newer processor types should be high on the list. |
@brianborchers I am not aware of any change in 0.2.20 that would make it markedly slower on Haswell et al., nor anything that would make it faster there (except perhaps the reintroduction of -O2 to the LAPACK build). Most optimization work in the release was for other architectures (ARM,PPC,Z). |
From my opinion we should try to follow the stuff Intel is adding to the BLAS interface ( like the matcopy operations and so on) such that one can use OpenBLAS as an replacement if the MKL is not available, either for license reason or because one uses other architectures. |
@xianyi Why not support KNL first to get warmed up with AVX-512? It's easier to get KNL access right now. |
It's easy to get new xeon platinum... Unlike MIC that is not yet identified correctly.... |
I did some tests for several scientific software on a Skylake Silver Series (2x 4110) and there I recognized that using the AVX-512 instead of AVX2 units will slow down the HPL from 630 to 400 GFlops using the MKL. On the "better" Xeon like Gold and Platinum Series this might changes because they have more AVX-512 execution units and so this may accelerate the code. So optimizing for the Skylake(and the other upcoming *Lakes) OpenBLAS has to check wheater it is a consumer CPU(like core i7-6/7/8xxxx), an entry level Server CPU or an enterprise Server CPU. On the first ones the optmization should be done on AVX2 level on the later ones one has to check wether it is worth to move to AVX-512 or not. |
@brada4 I am glad that you have no problem with that, but perhaps our needs are different. I made that comment having tried last week to arrange remote access to Xeon 61xx or 81xx processors for collaborator of mine. No US supercomputing center has them in a production state yet, unlike Xeon Phi 72xx processors, to which I can provide access to collaborators via NERSC in a day. As for your "is not yet identified correctly", please provide more details. The CPUID detection of KNL is no different or more difficult than any other recent Intel processor. @grisuthedragon I will report that issue to the MKL team. I am not aware of any internal discussion of such an issue, but I've only used Xeon Platinum processors. As for determining whether a Xeon x1xx processor has one or two AVX-512 units, the easiest way is to read the SKU, as the 5122, 61xx and 81xx parts have two AVX-512 units, while the other 51xx and the lower have one AVX-512 unit. You can associate SKUs with CPUID using something like https://github.com/tycho/cpuid/blob/master/handlers.c#L919, which emits One can also determine it empirically during library initialization, based upon the pipeline behavior, but I don't recommend this method since it takes longer than the former. Source: I work for Intel. |
@martin-frbg do you think should it be copy of haswell or alias haswell now? |
KNL support is currently implemented by aliasing to HASWELL as that is the closest supported type. Once someone starts writing KNL-specific code it may make sense to copy the haswell code for everything not yet re-implemented (but I understand the cpu cache layout is quite different). |
Did you use Netlib HPL linked against MKL or Intel HPL that's included with MKL? |
I was just thinking to OCR cpu-z pictures |
Bump. We really need a new release. @xianyi May I request if you are busy, to move the project to the OpenBLAS organization and provide admin access to a few other contributors? In Julia we are already cherry-picking commits and effectively duplicating some of the release work on OpenBLAS. Would be nice to share the workload with others. |
I have now compiled a preliminary 0.3.0 release in my fork: |
@andreasnoack @Sacha0 Pinging you to see if we can test this out the Julia testsuite - on x86 and arm. |
There are some issues on x86, see #1563 |
So I have finally created the "official" 0.3.0 release after fixing the Nehalem TRMM regression from #1563. Hopefully all went well, I expect it may take xianyi a few days to update the openblas.net page. I have created an informal "Project" to collect potential candidates for future versions at https://github.com/xianyi/OpenBLAS/projects/1 as this recent github feature looks more appropriate for such tasks than an issue ticket. While there is no set date for a next release I certainly hope it will not take another 9 months. |
Thank you! |
@martin-frbg could you add windows binaries (may I ask for extra set without lapack and importantly fortran lib dependencies?) sort of people struggle building it natively and something directly consumable would help a lot. |
We are building binaries for Julia here https://github.com/staticfloat/OpenBLASBuilder/releases. The 0.3.0 version should appear soon. |
Thank you. Is it OK to refer to this download location on the OpenBLAS wiki at https://github.com/xianyi/OpenBLAS/wiki/Precompiled-installation-packages ? |
Thanks for the new release. We also will build macOS, Linux, and Windows binaries, which others are free to use. Details can be easily found from this repo. |
Close this issue? |
Closing after adding the wiki links. |
Our binaries are here: https://github.com/staticfloat/OpenBLASBuilder/releases as @andreasnoack mentioned, and now have the 0.3 binaries there as well. With 0.3.1, once PPC tests are fixed, we will have those too. |
The next release date is about Dec 2017.
Develop plans:
Optimize Intel Skylake
Keep integrate and test ReLAPACK and LAPACK
Could you provide any other ideas?
The text was updated successfully, but these errors were encountered: