Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more source-build problems, with openblas now on CI #53172

Closed
vtjnash opened this issue Feb 4, 2024 · 9 comments · Fixed by #53221
Closed

more source-build problems, with openblas now on CI #53172

vtjnash opened this issue Feb 4, 2024 · 9 comments · Fixed by #53221
Labels
building Build system, or building Julia or its dependencies external dependencies Involves LLVM, OpenBLAS, or other linked libraries regression Regression in behavior compared to a previous version upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@vtjnash
Copy link
Member

vtjnash commented Feb 4, 2024

In file included from ../kernel/x86_64/sbgemm_kernel_16x16_spr.c:31:
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c: In function 'sbgemm_kernel_spr_alpha_one':
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:67:2: warning: implicit declaration of function '_tile_loadconfig' [-Wimplicit-function-declaration]
   67 |  _tile_loadconfig(&cfg);
      |  ^~~~~~~~~~~~~~~~
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:201:4: note: in expansion of macro 'TCONF'
  201 |    TCONF(cfg, 16, 16, 32);
      |    ^~~~~
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:137:22: warning: implicit declaration of function '_tile_loadd' [-Wimplicit-function-declaration]
  137 | #define LOAD_C(M, N) _tile_loadd(T_C##M##N, ptr_c##M##N, ldc * 4)
      |                      ^~~~~~~~~~~
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:212:5: note: in expansion of macro 'LOAD_C'
  212 |     LOAD_C(0, 0); LOAD_C(0, 1);
      |     ^~~~~~

https://buildkite.com/julialang/julia-master-scheduled/builds/643#018d6e03-2129-477c-9c41-22a940ea5492

@vtjnash vtjnash added building Build system, or building Julia or its dependencies regression Regression in behavior compared to a previous version external dependencies Involves LLVM, OpenBLAS, or other linked libraries labels Feb 4, 2024
@inkydragon inkydragon added the upstream The issue is with an upstream dependency, e.g. LLVM label Feb 6, 2024
@inkydragon
Copy link
Member

These two functions are Intel® AMX (Advanced Matrix Extensions) Intrinsics Functions

Looks like a legacy issue, most recent modifications were 3 years ago:
https://github.com/OpenMathLib/OpenBLAS/blame/e5d2725e5a29981cab4068233ab97b6daa25b540/kernel/x86_64/sbgemm_oncopy_16_spr.c#L63

Maybe it's not a regression?

@giordano
Copy link
Contributor

giordano commented Feb 6, 2024

Those reported above are warnings, not errors?

@inkydragon
Copy link
Member

errors in build openblas

../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:116:16: error: incompatible types when initializing type '__m256i' using type 'int'
  116 |  __m256i ymm = _mm256_loadu_epi16(ptr_b##N); \
      |                ^~~~~~~~~~~~~~~~~~
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:359:21: note: in expansion of macro 'LOAD_B_TAIL'
  359 |       LOAD_B(x, 0); LOAD_B_TAIL(x, 1);
      |                     ^~~~~~~~~~~
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:116:16: error: incompatible types when initializing type '__m256i' using type 'int'
  116 |  __m256i ymm = _mm256_loadu_epi16(ptr_b##N); \
      |                ^~~~~~~~~~~~~~~~~~
../kernel/x86_64/sbgemm_kernel_16x16_spr_tmpl.c:435:7: note: in expansion of macro 'LOAD_B_TAIL'

@inkydragon
Copy link
Member

_mm256_loadu_epi16
Need CPU flag: AVX512BW + AVX512VL

Maybe our CPU doesn't support this?

@giordano
Copy link
Contributor

giordano commented Feb 6, 2024

We do a dynamic arch build

OPENBLAS_BUILD_OPTS += DYNAMIC_ARCH=1
the host CPU shouldn't matter. But it's quite possible a newer compiler is needed, in Yggdrasil we're now using gcc 11 for openblas, I don't know what's used in CI here for the source build.

@inkydragon
Copy link
Member

inkydragon commented Feb 6, 2024

Compare with old build (19th Dec 2023) which openblas build successfully.
(Search "OpenBLAS build complete" in full log)

A new flag has been added: BUILD_BFLOAT16=1.

image

@giordano
Copy link
Contributor

giordano commented Feb 6, 2024

But it's quite possible a newer compiler is needed, in Yggdrasil we're now using gcc 11 for openblas, I don't know what's used in CI here for the source build.

It's hard to read to raw log, I see a line containing

gcc version 10.2.1 20210110 (Debian 10.2.1-6)

if that's the compiler used for the build, according to JuliaPackaging/Yggdrasil#7202 (comment) gcc 10 should be sufficient to build the bfloat16 kernels.

KristofferC pushed a commit that referenced this issue Feb 7, 2024
…53221)

Not clear why OpenBLAS build fails, GCC 10 should be sufficient to
compile the Bfloat16 kernels and [from what I can
tell](#53172 (comment))
that's the compiler version used in CI, but I don't know how to verify
it since this is a nightly job. If someone who knows more about the
setup can chime in, that'd be great. In the meantime, disabling these
kernels should fix #53172.
@imciner2
Copy link
Contributor

imciner2 commented Feb 7, 2024

Well, based on the comment here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95483#c4, many of these intrinsics were missing and then added in GCC 11, e.g. the intrinisic _mm256_loadu_epi16 is referenced in that bug report as being added for GCC 11.

I'm not entirely sure what would make it seem like it worked on Yggdrasil when I built it before and not work here, but I may have just skipped to GCC 11 and not fully dived into anything for GCC 10 when I was doing the builds.

@imciner2
Copy link
Contributor

imciner2 commented Feb 7, 2024

(and as it stands, building from source on Windows also needs our own patched GCC for BFloat16 to get the optimized kernels, unless the user has GCC 12.3 or 13+ and is using that), so for now, BFloat16 support is probably not something to expose in pure from-source builds due to its somewhat complex compiler support history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies external dependencies Involves LLVM, OpenBLAS, or other linked libraries regression Regression in behavior compared to a previous version upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants