Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openblas 0.3.8 #50252

Closed
wants to merge 43 commits into from
Closed

openblas 0.3.8 #50252

wants to merge 43 commits into from

Conversation

zbeekman
Copy link
Contributor

Created with brew bump-formula-pr.

@zbeekman
Copy link
Contributor Author

Hopefully we don't need revision bumps on formulae that use openblas, I'll wait and see what testbot has to say about that. For reference, those formulae are:

  • arpack
  • clp
  • coinutils
  • dlib
  • dynare
  • ipopt
  • numpy
  • numpy@1.16
  • nwchem
  • octave
  • proteinortho
  • qrupdate
  • r
  • scalapack
  • scipy
  • shogun
  • suite-sparse
  • sundials
  • superlu

@zbeekman zbeekman added the build failure CI fails while building the software label Feb 15, 2020
@zbeekman
Copy link
Contributor Author

This is failing with invalid mnemonic dmb. The OpenBLAS wiki seems to think that this can't be built with older clangs but, I don't see a good reason why we need to be building the C portions using GCC-9 over native clang. So I'm going to test locally using apple clang and see if that resolves the issue. I also wonder if the explicit disabling of AVX512 is really needed. I suspect that the introspection done during the build should handle this for us. But I won't touch it if I can get the build to pass with clang.

@zbeekman
Copy link
Contributor Author

Switching back to clang resolves this build failure. (At least locally...) It looks like other people are having issues with OpenMP and the latest OpenBLAS: OpenMathLib/OpenBLAS#2416

I reported the issue in OpenMathLib/OpenBLAS#2421

@zbeekman zbeekman added ready to merge PR can be merged once CI is green and removed build failure CI fails while building the software labels Feb 15, 2020
@zbeekman zbeekman added revision bumps needed Reverse dependencies need to have their revision incremented in the same PR and removed ready to merge PR can be merged once CI is green labels Feb 17, 2020
@zbeekman
Copy link
Contributor Author

sigh, I'll work on rev-bumping everything. Some things may need a dependency on libomp now too. I'll start with rev-bumps and hope that any weirdness with libomp is caught during build and testing.

@Homebrew/core What do you think about this effort to migrate OpenBLAS to use native clang over GCC? The issue causing the build failure with the latest OpenBLAS 0.3.8 was resolved upstream, so we don't need to migrate to clang. But local testing indicates that things work fine with clang and this would be more consistent with Homebrew's philosophy of preferring native tooling.

@zbeekman
Copy link
Contributor Author

@zbeekman, can you post the full build log?

For the CI & binary build? Maybe... I think a bunch of the output gets eaten by Jenkins. I'm not sure which parts need permission to view but I can view the CI logs at https://jenkins.brew.sh/job/Homebrew%20Core%20Pull%20Requests/57076/

I can certainly share my local build logs but since I'm on Haswell that will not be informative.

@isuruf
Copy link

isuruf commented Feb 17, 2020

I can certainly share my local build logs but since I'm on Haswell that will not be informative.

That's fine. I checked the URL you gave me, but couldn't find the logs

@zbeekman
Copy link
Contributor Author

@fxcoudert I noticed your comment on a closed won't fix PR: #45300 (comment)

Are you OK moving to building OpenBLAS with native Apple Clang? It works around a bug in the latest release and is more inline with the philosophy of using native tooling.

However, moving to Apple Clang required introducing a libomp dependency.

Fingers crossed GFortran + Clang + libomp doesn't break OpenBLAS threading. I think it would be good to try to move to clang, but if we encounter breakage we can patch the 0.3.8 release with commits applied upstream and switch back to GCC for the C code.

@MikeMcQuaid
Copy link
Member

However, moving to Apple Clang required introducing a libomp dependency.

This seems fine 👍

@MikeMcQuaid
Copy link
Member

OpenBLAS builds AVX512 kernels if the compiler and assembler on the build machine supports them and enables them at runtime only if the host support them. (If the compiler is old, then a host with AVX512 support will not get the benefit)

Cool, thanks 👍

@zbeekman
Copy link
Contributor Author

The build dies when removing the NO_AVX512 environment variable anyway, so I rewound and force pushed to my fork.

@isuruf FYI, the errors look like this:

15:38:10 ==> make CC=clang FC=gfortran libs netlib shared
15:38:10 Last 150 lines from /Users/brew/Jenkins/workspace/core/logs/openblas/01.make:
15:38:10 ../kernel/x86_64/sdot_microk_skylakex-2.c:43:12: error: always_inline function '_mm256_setzero_ps' requires target feature 'avx', but would be inlined into function 'sdot_kernel_16' that is compiled without support for 'avx'
15:38:10         accum_2 = _mm256_setzero_ps();
15:38:10                   ^
15:38:10 ../kernel/x86_64/sdot_microk_skylakex-2.c:44:12: error: always_inline function '_mm256_setzero_ps' requires target feature 'avx', but would be inlined into function 'sdot_kernel_16' that is compiled without support for 'avx'
15:38:10         accum_3 = _mm256_setzero_ps();
15:38:10                   ^
15:38:10 ../kernel/x86_64/sdot_microk_skylakex-2.c:74:14: error: always_inline function '_mm256_loadu_ps' requires target feature 'avx', but would be inlined into function 'sdot_kernel_16' that is compiled without support for 'avx'
15:38:10                 accum_0 += _mm256_loadu_ps(&x[i+ 0]) * _mm256_loadu_ps(&y[i+ 0]);
15:38:10                            ^
15:38:10 ../kernel/x86_64/sdot_microk_skylakex-2.c:74:42: error: always_inline function '_mm256_loadu_ps' requires target feature 'avx', but would be inlined into function 'sdot_kernel_16' that is compiled without support for 'avx'
15:38:10                 accum_0 += _mm256_loadu_ps(&x[i+ 0]) * _mm256_loadu_ps(&y[i+ 0]);

@isuruf
Copy link

isuruf commented Feb 17, 2020

@zbeekman, can you open an issue upstream?

@isuruf
Copy link

isuruf commented Feb 17, 2020

Which clang is this? I see that -fopenmp is added, but it doesn't error, so this is not Apple clang then?

@zbeekman
Copy link
Contributor Author

Idk why qrupdate is failing against OpenBLAS now. There are some warnings about type mismatches but I’ll have to dig in to the details.

@zbeekman
Copy link
Contributor Author

Current failures are:

  • brew-test-bot.catalina.linkage --test root
    • /usr/local/opt/gl2ps/lib/libgl2ps.1.dylib (gl2ps)
    • /usr/local/opt/openblas/lib/libopenblasp-r0.3.7.dylib (openblas)
      • Bad SO version for openblas... not sure why the build is linking to specific versioned dylibs. I'm guessing the issue is the same for gl2ps
  • brew-test-bot.catalina.test qrupdate
    • A test seems to be passing data of the wrong type. I'll dig into this a little bit
  • brew-test-bot.catalina.install --build-bottle nwchem

@fxcoudert
Copy link
Member

The current stand on OpenMP is: use Apple’s clang + libomp (which is a hack) only for standalone formulas that strongly benefit from OpenMP and do not expose this to dependents (good example being imagemagick). This allows for OpenMP support without the full cost of a GCC dependency (which is heavy). This was the condition under which libomp was accepted into homebrew-core, and I think we should really have a full discussion if we want to change that state of things.

Here, GCC is needed anyway (for gfortran), so we're actually adding a dependency (libomp) for no gain. And we're creating potential for trouble by mixing compilers and libraries, because depending on GCC + libomp is creating mixed linkage to libomp and libgomp in some formulas: things like arpack and its dependents. It could also create a problem for people who would wish to use openblas with GCC’s libgomp.

I don't think it's worth it.

@zbeekman
Copy link
Contributor Author

@fxcoudert

This was the condition under which libomp was accepted into homebrew-core, and I think we should really have a full discussion if we want to change that state of things.
...
Here, GCC is needed anyway (for gfortran), so we're actually adding a dependency (libomp) for no gain. And we're creating potential for trouble by mixing compilers and libraries, because depending on GCC + libomp is creating mixed linkage to libomp and libgomp in some formulas: things like arpack and its dependents. It could also create a problem for people who would wish to use openblas with GCC’s libgomp.

Thank you so much for looking at this FX! This comment has given me a better understanding of both the original rationale for building with GCC and potential issues that may arise. Given my experience so far, and my improved understanding from your comment I would agree with you that migrating to clang and libomp is not worth it, and would be more fragile than keeping this as building with GCC.

In this case, I think we either need to backport patches to allow OpenBLAS 0.3.8 to build with GCC-9, or wait until the next OpenBLAS release. @isuruf do you know if a release is planned soon?

I would like to get OpenBLAS 0.3.8 deployed sooner rather than later, so I'm leaning towards backporting the patches already applied to the OpenBLAS development branch.

@sjackman
Copy link
Member

In this case, I think we either need to backport patches to allow OpenBLAS 0.3.8 to build with GCC-9,

If the patches are coming from upstream, and are easy to apply with a patch do block, then sure.

or wait until the next OpenBLAS release.

That'd be my inclination.

@isuruf
Copy link

isuruf commented Feb 19, 2020

@isuruf do you know if a release is planned soon?

I have no idea.

The current stand on OpenMP is: use Apple’s clang + libomp (which is a hack) only for standalone formulas that strongly benefit from OpenMP and do not expose this to dependents (good example being imagemagick). This allows for OpenMP support without the full cost of a GCC dependency (which is heavy). This was the condition under which libomp was accepted into homebrew-core, and I think we should really have a full discussion if we want to change that state of things.

Slightly off-topic: I maintain the compiler packages for a different package manager and we recently switched to libomp even with GCC. (I added a few GOMP compatibility functions to libomp, which makes it work fine now. Only missing symbols are target offloading stuff) Let me know if you need more info.

@zbeekman
Copy link
Contributor Author

Slightly off-topic: I maintain the compiler packages for a different package manager and we recently switched to libomp even with GCC. (I added a few GOMP compatibility functions to libomp, which makes it work fine now. Only missing symbols are target offloading stuff) Let me know if you need more info.

What was the motivation for the switch?

So gcc, g++, gfortran link with libomp by default instead of libgomp?

Intriguing idea, but makes me nervous about reliability and unintended side-effects. I'd love to hear more.

@isuruf
Copy link

isuruf commented Feb 20, 2020

What was the motivation for the switch?

There were 2 different OpenMP runtimes being linked in. (gfortran linked in libgomp and clang linked in libomp).
Also, libgomp is not fork safe while libomp is.

So gcc, g++, gfortran link with libomp by default instead of libgomp?

Yes, replaced libgomp.so by the symlink created by libomp.

@zbeekman
Copy link
Contributor Author

@sjackman: FYI, --HEAD builds fine using the "old" formula, so the patches have hit the main development branch. I'm inclined to wait for the next nwchem and then patch OpenBLAS using the upstream PRs implementing the fix(es) unless we find out another OpenBLAS release is around the corner.

@zbeekman
Copy link
Contributor Author

@isuruf

Yes, replaced libgomp.so by the symlink created by libomp.

Wait, it’s as simple as that?!?

You just build gcc and friends and then delete libgomp and make a symlink of the same name to libomp? There aren’t any GNU extensions in libgomp? And they’re ABI compatible?

What distro or package manager?

@isuruf
Copy link

isuruf commented Feb 21, 2020

You just build gcc and friends and then delete libgomp and make a symlink of the same name to libomp

Yes, but they have different compatibility_version on OSX, so you can't just replace. You have to either rebuild all your packages using libgomp, or change the compatibility_version of libomp.

There aren’t any GNU extensions in libgomp? And they’re ABI compatible?

libomp has 2 sets of symbols. GNU symbols (GOMP_) and Intel symbols (_kmpc). clang will generate code that calls Intel symbols. If you use GCC, it will generate code that calls GNU symbols. On libomp, the GNU symbols implementation call the Intel symbols, so they are compatible.
I have to warn that there are a few missing symbols in libomp than libgomp. These relate to target offloading in OpenMP 4.5

What distro or package manager?

conda package manager on OSX>=10.9

@zbeekman
Copy link
Contributor Author

@isuruf thank you so much for that information. I’ll try to take a look at how conda builds gcc and OpenBLAS. It might be worthwhile to consider migrating all Homebrew OpenMP packages (and gcc) to libomp. I doubt many people or formulae are actually using offloading thanks to Apples feud with NVIDIA. I haven’t been following OMP offload as closely as I should, I wonder what the support for ATI is like in each of them.

@zbeekman zbeekman mentioned this pull request Feb 21, 2020
5 tasks
@edoapra edoapra mentioned this pull request Feb 26, 2020
5 tasks
@zbeekman
Copy link
Contributor Author

NWChem has shipped, resolving the MPI failures. I pinged @isuruf and @martin-frbg to see if there are plans for another release of OpenBLAS anytime soon, or if I should backport upstream patches. Either way, I'll start with dropping all the revision bumps and rebasing on a more recent master.

@Bo98
Copy link
Member

Bo98 commented Mar 4, 2020

Superseded by #51116.

@Bo98 Bo98 closed this Mar 4, 2020
@Bo98 Bo98 added the superseded PR was replaced by another PR label Mar 4, 2020
@zbeekman
Copy link
Contributor Author

zbeekman commented Mar 4, 2020

haha, thanks I was about to do this...

@lock lock bot added the outdated PR was locked due to age label Apr 4, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Apr 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age superseded PR was replaced by another PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants