Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround an ICE in clang 9.0.0 #2329

Merged
merged 1 commit into from
Dec 2, 2019
Merged

Conversation

isuruf
Copy link
Contributor

@isuruf isuruf commented Dec 1, 2019

This bug is not there in 8.x nor in the 9.0 daily snapshot.

I managed to reduce the reproducing failure to the following,

(base) isuru@isuru:~$ cat file.c
typedef double __m128d __attribute__((__vector_size__(16), __aligned__0));

static __m128d __attribute__(()) _mm_add_pd();
__m128d _mm_hadd_pd___a;
static __inline__ __m128d __attribute__(()) _mm_hadd_pd(__m128d __b) {
  return __builtin_ia32_haddpd(_mm_hadd_pd___a, __b);
}
static void dsymv_kernel_4x4(float *a, float *x, float *y, float *temp1,
                             float *temp2) {
  __m128d half_accum0, half_accum1, half_accum2, half_accum3;
  half_accum0 = _mm_add_pd();
  half_accum1 = _mm_add_pd();
  half_accum2 = _mm_add_pd();
  half_accum3 = _mm_hadd_pd(half_accum3);
  temp2[0] += half_accum0[0];
  temp2[1] += half_accum1[0];
  temp2[2] += half_accum2[0];
  temp2[3] += half_accum3[0];
}
long CNAME_m = 100;
int CNAME(float alpha, float *a, long lda, float *x, long inc_x,
          float *y, long inc_y, float *buffer) {
  long i;
  long j;
  float tmp1[4];
  float tmp2[4];
  float *ap;
  long offset1 = 100;
  for (j = 0; j < offset1; j += 4) {
    long from = j + 1;
    if (CNAME_m - from >= 12) {
      long m2 = (CNAME_m / 4) * 4;
      for (i = j + 1; i < j + 4; i++)
        if (m2 > j + 4)
          dsymv_kernel_4x4(ap, x, y, tmp1, tmp2);
      for (i = m2; i < CNAME_m; i++)
        ;
    }
    y[j] += alpha * tmp2[0];
    y[j + 1] += alpha * tmp2[1];
    y[j + 2] += alpha * tmp2[2];
    y[j + 3] += alpha * tmp2[3];
  }
}
(base) isuru@isuru:~$ clang-9 file.c -O2 -march=skylake-avx512 -c
file.c:1:60: warning: unknown attribute '__aligned__0' ignored [-Wunknown-attributes]
typedef double __m128d __attribute__((__vector_size__(16), __aligned__0));
                                                           ^
file.c:44:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
file.c:3:34: warning: function '_mm_add_pd' has internal linkage but is not defined [-Wundefined-internal]
static __m128d __attribute__(()) _mm_add_pd();
                                 ^
file.c:11:17: note: used here
  half_accum0 = _mm_add_pd();
                ^
fatal error: error in backend: Cannot emit physreg copy instruction
clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 9.0.0 (https://github.com/conda-forge/clangdev-feedstock 284a3d5d88509307bcfba64b055653ee347371db)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/isuru/miniconda3/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to  and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /tmp/file-066422.c
clang-9: note: diagnostic msg: /tmp/file-066422.sh
clang-9: note: diagnostic msg: 

@isuruf
Copy link
Contributor Author

isuruf commented Dec 1, 2019

While clang-9 file.c -O2 -march=skylake-avx512 -c fails clang-9 file.c -O1 -march=skylake-avx512 -c doesn't.

@martin-frbg
Copy link
Collaborator

If the build passes at O1, perhaps this is better handled with a (suitably ifdef'd) #pragma clang optimize off which would still allow it to use AVX512 ?

This bug is not there in 8.x nor in the 9.0 daily snapshot.
@martin-frbg martin-frbg merged commit 235599f into OpenMathLib:develop Dec 2, 2019
@isuruf isuruf deleted the patch-1 branch December 2, 2019 08:35
@ax3l
Copy link

ax3l commented May 4, 2020

This problem just re-appeared with AppleClang 11.0.3 on OSX 10.15. Should we make the guard broader to cover the Apple Clang fork, too?

@martin-frbg
Copy link
Collaborator

Guess so. Does the Apple Clang use any unique identifiers (apart from probably reporting clang_major 11) ?

@isuruf
Copy link
Contributor Author

isuruf commented May 4, 2020

I don't have AppleClang 11.0.3, but Apple LLVM version 9.1.0 (clang-902.0.39.2) has the following

#define __apple_build_version__ 9020039
#define __clang__ 1
#define __clang_major__ 9
#define __clang_minor__ 1
#define __clang_patchlevel__ 0
#define __clang_version__ "9.1.0 (clang-902.0.39.2)"

@ax3l
Copy link

ax3l commented May 5, 2020

We should be able to find the corresponding versions in this table:
https://en.wikipedia.org/wiki/Xcode#Toolchain_versions

If the problem was only occurring in Clang 9.0.0 this seams to agree with the observation that AppleClang 11.0.3 is failing, too. We probably have to find the corresponding __apple_build_version__ macro value to differentiate between a vanilla llvm/clang on macOS and the apple fork.

@martin-frbg
Copy link
Collaborator

martin-frbg commented May 5, 2020

Presumably __apple_build_version__ will only be defined by the apple fork ? Then perhaps this could be as simple as #if defined(__apple_build_version__) && __clang_major__ == 9 at least until AppleClang ports the fix from mainline Clang 9.0.x

@ax3l
Copy link

ax3l commented May 5, 2020 via email

@martin-frbg
Copy link
Collaborator

martin-frbg commented May 5, 2020

#if defined(__apple_build_version__) && __clang_major__ == 11 && __clang_minor__ == 0 && __clang_patchlevel__ == 3 ? Or (I am not sure if I read the wikipedia page correctly) the much broader range of any __apple_build_version__ with a __clang_major__ greater than 8 (to include isuruf's "Apple LLVM 9.1.0") ?

@isuruf
Copy link
Contributor Author

isuruf commented May 5, 2020

Sorry for the confusion. Apple LLVM 9.1.0 works fine. I was giving an example of how apple's macros worked. You need,
#if (defined(__apple_build_version__) && __clang_major__ == 11 && __clang_minor__ == 0 && __clang_patchlevel__ == 3) || (!defined(__apple_build_version__) && __clang_major__ == 9 && __clang_minor__ == 0 && __clang_patchlevel__ == 0)

@martin-frbg
Copy link
Collaborator

Thanks for the clarification. I have made this a separate #if block in #2597 as that seemed more readable. (Also there may be a need to refine the Apple-specific version matching if Apple ports the fix in a much later version)

@martin-frbg
Copy link
Collaborator

Fix merged now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants