Plan for 0.3.0 #1245

xianyi · 2017-07-24T04:15:04Z

The next release date is about Dec 2017.

Develop plans:

Optimize Intel Skylake
Keep integrate and test ReLAPACK and LAPACK

Could you provide any other ideas?

carlkl · 2017-07-24T12:35:54Z

I propose to remove the following non-SSE targets:

ATHLON
KATMAI
COPPERMINE

brada4 · 2017-07-24T13:03:29Z

Avoid all 3dnow replacing with MMX due to lack of current cpus for testing?

brianborchers · 2017-07-25T03:36:33Z

I'm finding so far on my Kaby Lake machine that 0.2.20 seems a bit slower and certainly no faster than 0.2.19 compiled for the Haswell architecture. Optimizing for the newer processor types should be high on the list.

martin-frbg · 2017-07-25T06:18:34Z

@brianborchers I am not aware of any change in 0.2.20 that would make it markedly slower on Haswell et al., nor anything that would make it faster there (except perhaps the reintroduction of -O2 to the LAPACK build). Most optimization work in the release was for other architectures (ARM,PPC,Z).
Both versions are likely to misdetect the L1 cache size on Kaby Lake (#1232, fixed in develop now)
but the impact will probably depend on the type of calculations you do.

grisuthedragon · 2017-07-31T09:47:28Z

From my opinion we should try to follow the stuff Intel is adding to the BLAS interface ( like the matcopy operations and so on) such that one can use OpenBLAS as an replacement if the MKL is not available, either for license reason or because one uses other architectures.

jeffhammond · 2017-09-06T00:39:07Z

@xianyi Why not support KNL first to get warmed up with AVX-512? It's easier to get KNL access right now.

brada4 · 2017-09-06T05:41:22Z

It's easy to get new xeon platinum... Unlike MIC that is not yet identified correctly....

grisuthedragon · 2017-09-06T07:26:47Z

I did some tests for several scientific software on a Skylake Silver Series (2x 4110) and there I recognized that using the AVX-512 instead of AVX2 units will slow down the HPL from 630 to 400 GFlops using the MKL. On the "better" Xeon like Gold and Platinum Series this might changes because they have more AVX-512 execution units and so this may accelerate the code. So optimizing for the Skylake(and the other upcoming *Lakes) OpenBLAS has to check wheater it is a consumer CPU(like core i7-6/7/8xxxx), an entry level Server CPU or an enterprise Server CPU. On the first ones the optmization should be done on AVX2 level on the later ones one has to check wether it is worth to move to AVX-512 or not.

jeffhammond · 2017-09-06T13:00:52Z

@brada4 I am glad that you have no problem with that, but perhaps our needs are different. I made that comment having tried last week to arrange remote access to Xeon 61xx or 81xx processors for collaborator of mine. No US supercomputing center has them in a production state yet, unlike Xeon Phi 72xx processors, to which I can provide access to collaborators via NERSC in a day.

As for your "is not yet identified correctly", please provide more details. The CPUID detection of KNL is no different or more difficult than any other recent Intel processor.

@grisuthedragon I will report that issue to the MKL team. I am not aware of any internal discussion of such an issue, but I've only used Xeon Platinum processors.

As for determining whether a Xeon x1xx processor has one or two AVX-512 units, the easiest way is to read the SKU, as the 5122, 61xx and 81xx parts have two AVX-512 units, while the other 51xx and the lower have one AVX-512 unit.

You can associate SKUs with CPUID using something like https://github.com/tycho/cpuid/blob/master/handlers.c#L919, which emits Processor Name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz on my system.

One can also determine it empirically during library initialization, based upon the pipeline behavior, but I don't recommend this method since it takes longer than the former.

Source: I work for Intel.

brada4 · 2017-09-06T14:22:19Z

@martin-frbg do you think should it be copy of haswell or alias haswell now?

martin-frbg · 2017-09-06T14:37:25Z

KNL support is currently implemented by aliasing to HASWELL as that is the closest supported type. Once someone starts writing KNL-specific code it may make sense to copy the haswell code for everything not yet re-implemented (but I understand the cpu cache layout is quite different).

jeffhammond · 2017-09-06T16:09:10Z

@grisuthedragon

I did some tests for several scientific software on a Skylake Silver Series (2x 4110) and there I recognized that using the AVX-512 instead of AVX2 units will slow down the HPL from 630 to 400 GFlops using the MKL.

Did you use Netlib HPL linked against MKL or Intel HPL that's included with MKL?

brada4 · 2017-09-06T16:23:05Z

I was just thinking to OCR cpu-z pictures

ViralBShah · 2018-05-09T19:22:46Z

Bump. We really need a new release. @xianyi May I request if you are busy, to move the project to the OpenBLAS organization and provide admin access to a few other contributors?

In Julia we are already cherry-picking commits and effectively duplicating some of the release work on OpenBLAS. Would be nice to share the workload with others.

martin-frbg · 2018-05-15T10:18:24Z

I have now compiled a preliminary 0.3.0 release in my fork:
https://github.com/martin-frbg/OpenBLAS/releases/tag/v0.3.0
(by merging the current "develop" into the release-0.3.0 branch that xianyi had created). As this is the first time I am doing this, I would appreciate some feedback (also on the release notes) before I try to
repeat this on the official repository...

ViralBShah · 2018-05-15T11:46:38Z

@andreasnoack @Sacha0 Pinging you to see if we can test this out the Julia testsuite - on x86 and arm.

andreasnoack · 2018-05-17T01:52:27Z

There are some issues on x86, see #1563

martin-frbg · 2018-05-23T14:08:57Z

So I have finally created the "official" 0.3.0 release after fixing the Nehalem TRMM regression from #1563. Hopefully all went well, I expect it may take xianyi a few days to update the openblas.net page.

I have created an informal "Project" to collect potential candidates for future versions at https://github.com/xianyi/OpenBLAS/projects/1 as this recent github feature looks more appropriate for such tasks than an issue ticket. While there is no set date for a next release I certainly hope it will not take another 9 months.

ViralBShah · 2018-05-23T16:22:51Z

Thank you!

brada4 · 2018-05-25T13:38:54Z

@martin-frbg could you add windows binaries (may I ask for extra set without lapack and importantly fortran lib dependencies?) sort of people struggle building it natively and something directly consumable would help a lot.

andreasnoack · 2018-05-25T20:34:19Z

We are building binaries for Julia here https://github.com/staticfloat/OpenBLASBuilder/releases. The 0.3.0 version should appear soon.

martin-frbg · 2018-05-26T04:41:35Z

Thank you. Is it OK to refer to this download location on the OpenBLAS wiki at https://github.com/xianyi/OpenBLAS/wiki/Precompiled-installation-packages ?

jakirkham · 2018-05-26T09:00:33Z

Thanks for the new release. We also will build macOS, Linux, and Windows binaries, which others are free to use. Details can be easily found from this repo.

ViralBShah · 2018-05-29T23:56:54Z

Close this issue?

martin-frbg · 2018-05-31T09:32:32Z

Closing after adding the wiki links.

ViralBShah · 2018-05-31T13:14:01Z

Our binaries are here: https://github.com/staticfloat/OpenBLASBuilder/releases as @andreasnoack mentioned, and now have the 0.3 binaries there as well. With 0.3.1, once PPC tests are fixed, we will have those too.

martin-frbg mentioned this issue Aug 23, 2017

Plan for 0.2.21 #1258

Closed

martin-frbg mentioned this issue Nov 13, 2017

sdot yields wrong results for odd sizes > 2^24 and all sizes > 2^29 #1326

Closed

szha mentioned this issue Nov 21, 2017

Inference with openblas and Ubuntu 14.04 hangs on C5 instances. apache/mxnet#8741

Closed

martin-frbg closed this as completed May 31, 2018

Plan for 0.3.0 #1245

Plan for 0.3.0 #1245

Comments

xianyi commented Jul 24, 2017 • edited by martin-frbg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

carlkl commented Jul 24, 2017

Uh oh!

brada4 commented Jul 24, 2017

Uh oh!

brianborchers commented Jul 25, 2017

Uh oh!

martin-frbg commented Jul 25, 2017

Uh oh!

grisuthedragon commented Jul 31, 2017

Uh oh!

jeffhammond commented Sep 6, 2017

Uh oh!

brada4 commented Sep 6, 2017

Uh oh!

grisuthedragon commented Sep 6, 2017

Uh oh!

jeffhammond commented Sep 6, 2017

Uh oh!

brada4 commented Sep 6, 2017

Uh oh!

martin-frbg commented Sep 6, 2017

Uh oh!

jeffhammond commented Sep 6, 2017

Uh oh!

brada4 commented Sep 6, 2017

Uh oh!

ViralBShah commented May 9, 2018

Uh oh!

martin-frbg commented May 15, 2018

Uh oh!

ViralBShah commented May 15, 2018

Uh oh!

andreasnoack commented May 17, 2018

Uh oh!

martin-frbg commented May 23, 2018

Uh oh!

ViralBShah commented May 23, 2018

Uh oh!

brada4 commented May 25, 2018

Uh oh!

andreasnoack commented May 25, 2018

Uh oh!

martin-frbg commented May 26, 2018

Uh oh!

jakirkham commented May 26, 2018

Uh oh!

ViralBShah commented May 29, 2018

Uh oh!

martin-frbg commented May 31, 2018

Uh oh!

ViralBShah commented May 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xianyi commented Jul 24, 2017 •

edited by martin-frbg

Loading

ViralBShah commented May 31, 2018 •

edited

Loading