Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for 0.3.0 #1245

Closed
1 of 2 tasks
xianyi opened this issue Jul 24, 2017 · 26 comments
Closed
1 of 2 tasks

Plan for 0.3.0 #1245

xianyi opened this issue Jul 24, 2017 · 26 comments

Comments

@xianyi
Copy link
Collaborator

xianyi commented Jul 24, 2017

The next release date is about Dec 2017.

Develop plans:

  • Optimize Intel Skylake

  • Keep integrate and test ReLAPACK and LAPACK

Could you provide any other ideas?

@carlkl
Copy link

carlkl commented Jul 24, 2017

I propose to remove the following non-SSE targets:

  • ATHLON
  • KATMAI
  • COPPERMINE

@brada4
Copy link
Contributor

brada4 commented Jul 24, 2017

Avoid all 3dnow replacing with MMX due to lack of current cpus for testing?

@brianborchers
Copy link

I'm finding so far on my Kaby Lake machine that 0.2.20 seems a bit slower and certainly no faster than 0.2.19 compiled for the Haswell architecture. Optimizing for the newer processor types should be high on the list.

@martin-frbg
Copy link
Collaborator

@brianborchers I am not aware of any change in 0.2.20 that would make it markedly slower on Haswell et al., nor anything that would make it faster there (except perhaps the reintroduction of -O2 to the LAPACK build). Most optimization work in the release was for other architectures (ARM,PPC,Z).
Both versions are likely to misdetect the L1 cache size on Kaby Lake (#1232, fixed in develop now)
but the impact will probably depend on the type of calculations you do.

@grisuthedragon
Copy link
Contributor

From my opinion we should try to follow the stuff Intel is adding to the BLAS interface ( like the matcopy operations and so on) such that one can use OpenBLAS as an replacement if the MKL is not available, either for license reason or because one uses other architectures.

@jeffhammond
Copy link

@xianyi Why not support KNL first to get warmed up with AVX-512? It's easier to get KNL access right now.

@brada4
Copy link
Contributor

brada4 commented Sep 6, 2017

It's easy to get new xeon platinum... Unlike MIC that is not yet identified correctly....

@grisuthedragon
Copy link
Contributor

I did some tests for several scientific software on a Skylake Silver Series (2x 4110) and there I recognized that using the AVX-512 instead of AVX2 units will slow down the HPL from 630 to 400 GFlops using the MKL. On the "better" Xeon like Gold and Platinum Series this might changes because they have more AVX-512 execution units and so this may accelerate the code. So optimizing for the Skylake(and the other upcoming *Lakes) OpenBLAS has to check wheater it is a consumer CPU(like core i7-6/7/8xxxx), an entry level Server CPU or an enterprise Server CPU. On the first ones the optmization should be done on AVX2 level on the later ones one has to check wether it is worth to move to AVX-512 or not.

@jeffhammond
Copy link

@brada4 I am glad that you have no problem with that, but perhaps our needs are different. I made that comment having tried last week to arrange remote access to Xeon 61xx or 81xx processors for collaborator of mine. No US supercomputing center has them in a production state yet, unlike Xeon Phi 72xx processors, to which I can provide access to collaborators via NERSC in a day.

As for your "is not yet identified correctly", please provide more details. The CPUID detection of KNL is no different or more difficult than any other recent Intel processor.

@grisuthedragon I will report that issue to the MKL team. I am not aware of any internal discussion of such an issue, but I've only used Xeon Platinum processors.

As for determining whether a Xeon x1xx processor has one or two AVX-512 units, the easiest way is to read the SKU, as the 5122, 61xx and 81xx parts have two AVX-512 units, while the other 51xx and the lower have one AVX-512 unit.

You can associate SKUs with CPUID using something like https://github.com/tycho/cpuid/blob/master/handlers.c#L919, which emits Processor Name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz on my system.

One can also determine it empirically during library initialization, based upon the pipeline behavior, but I don't recommend this method since it takes longer than the former.

Source: I work for Intel.

@brada4
Copy link
Contributor

brada4 commented Sep 6, 2017

@martin-frbg do you think should it be copy of haswell or alias haswell now?

@martin-frbg
Copy link
Collaborator

KNL support is currently implemented by aliasing to HASWELL as that is the closest supported type. Once someone starts writing KNL-specific code it may make sense to copy the haswell code for everything not yet re-implemented (but I understand the cpu cache layout is quite different).

@jeffhammond
Copy link

@grisuthedragon

I did some tests for several scientific software on a Skylake Silver Series (2x 4110) and there I recognized that using the AVX-512 instead of AVX2 units will slow down the HPL from 630 to 400 GFlops using the MKL.

Did you use Netlib HPL linked against MKL or Intel HPL that's included with MKL?

@brada4
Copy link
Contributor

brada4 commented Sep 6, 2017

I was just thinking to OCR cpu-z pictures

@ViralBShah
Copy link
Contributor

Bump. We really need a new release. @xianyi May I request if you are busy, to move the project to the OpenBLAS organization and provide admin access to a few other contributors?

In Julia we are already cherry-picking commits and effectively duplicating some of the release work on OpenBLAS. Would be nice to share the workload with others.

@martin-frbg
Copy link
Collaborator

I have now compiled a preliminary 0.3.0 release in my fork:
https://github.com/martin-frbg/OpenBLAS/releases/tag/v0.3.0
(by merging the current "develop" into the release-0.3.0 branch that xianyi had created). As this is the first time I am doing this, I would appreciate some feedback (also on the release notes) before I try to
repeat this on the official repository...

@ViralBShah
Copy link
Contributor

@andreasnoack @Sacha0 Pinging you to see if we can test this out the Julia testsuite - on x86 and arm.

@andreasnoack
Copy link
Contributor

There are some issues on x86, see #1563

@martin-frbg
Copy link
Collaborator

So I have finally created the "official" 0.3.0 release after fixing the Nehalem TRMM regression from #1563. Hopefully all went well, I expect it may take xianyi a few days to update the openblas.net page.

I have created an informal "Project" to collect potential candidates for future versions at https://github.com/xianyi/OpenBLAS/projects/1 as this recent github feature looks more appropriate for such tasks than an issue ticket. While there is no set date for a next release I certainly hope it will not take another 9 months.

@ViralBShah
Copy link
Contributor

Thank you!

@brada4
Copy link
Contributor

brada4 commented May 25, 2018

@martin-frbg could you add windows binaries (may I ask for extra set without lapack and importantly fortran lib dependencies?) sort of people struggle building it natively and something directly consumable would help a lot.

@andreasnoack
Copy link
Contributor

We are building binaries for Julia here https://github.com/staticfloat/OpenBLASBuilder/releases. The 0.3.0 version should appear soon.

@martin-frbg
Copy link
Collaborator

Thank you. Is it OK to refer to this download location on the OpenBLAS wiki at https://github.com/xianyi/OpenBLAS/wiki/Precompiled-installation-packages ?

@jakirkham
Copy link
Contributor

Thanks for the new release. We also will build macOS, Linux, and Windows binaries, which others are free to use. Details can be easily found from this repo.

@ViralBShah
Copy link
Contributor

Close this issue?

@martin-frbg
Copy link
Collaborator

Closing after adding the wiki links.

@ViralBShah
Copy link
Contributor

ViralBShah commented May 31, 2018

Our binaries are here: https://github.com/staticfloat/OpenBLASBuilder/releases as @andreasnoack mentioned, and now have the 0.3 binaries there as well. With 0.3.1, once PPC tests are fixed, we will have those too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants