-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in Hermitian eigen
/eigvals
on nightly
#1086
Comments
We just bumped openblas yesterday, so it is quite likely an upstream bug. On my mac I'm not getting a segmentation fault, but it seems to keep running endlessly. We should report this upstream. |
Works fine for size |
For the record, I am being able to call #include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>
int main() {
int n = 64;
double *a;
a = (double*) malloc(n*n*sizeof(double));
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(int row=0; row<n; row++){
for(int col=0; col<=row; col++){
a[row*n + col] = (double)rand()/(double)(RAND_MAX);
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
int lda = n;
// Compute eigenvalues and eigenvectors
LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);
// Print eigenvalues
printf("Eigenvalues:\n");
for (int i = 0; i < n; i++) {
printf("%f\n", w[i]);
}
free(a);
free(w);
return 0;
} Compiling with gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -lopenblas This runs correctly and produces results. |
I've noticed that this error occurs in julia when I set rr trace attached: https://julialang-dumps.s3.amazonaws.com/reports/2024-08-13T11-47-58-jishnub.tar.zst |
It'd be useful to make sure you can reproduce the segfault when linking to Julia's libopenblas build: if not, that test may not be useful. |
Doesn't segfault when linking to our openblas with 1 or 4 threads. Used slightly modified driver below:
|
Then that doesn't seem to be a good reproducer 🙂 |
Or there could perhaps be a bug on the Julia side? The LAPACKE interface seems to hide the |
Perhaps, but I'd spend more time to make sure the C reproducer is 100% faithful to what we're doing on the Julia side. Also, if you want to build OpenBLAS locally, to reproduce the Yggdrasil build I'd suggest compiling OpenBLAS with something like
|
Also, do we have a backtrace with gdb? |
@jishnub The default algorithm seems to call |
The segfault happens for all the algorithms |
Of course svd failing may not be additional data, in that it is probably using the same internal LAPACK routines. |
So, it looks to me the segmentation fault is in OpenBLAS. Certainly the issue could be in what Julia passes to OpenBLAS, but this again begs for a more faithful C reproducer. |
x86_64-linux-gnu |
So, if I set BLAS num threads to 1, then no crash on the Hermitian 64x64.
|
I believe a more faithful reproducer using the direct fortran API calling the workspace query, which does not crash:
|
@jishnub can you try with https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz? I get no crashes: julia> using LinearAlgebra
julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037
julia> H = Hermitian(ones(1024, 1024));
julia> eigen(H);
julia> This was compiled with GCC 12. As I mentioned in OpenMathLib/OpenBLAS#4868 (comment), the segmentation fault appears to go away depending on the version of GCC used (but I can't tell whether it's a bug in OpenBLAS which is hidden/surfaced by some compiler versions, or a genuine compiler error) |
Oddly, I seem to still face the issue using this: julia> using LinearAlgebra
julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037
julia> H = Hermitian(ones(1024, 1024));
julia> eigen(H);
julia> BLAS.get_num_threads()
1
julia> BLAS.set_num_threads(4)
julia> eigen(H);
[88907] signal 11 (1): Segmentation fault
[88907] signal 11 (1): Segmentation fault
in expression starting at REPL[7]:1
Allocations: 2069146 (Pool: 2069051;Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
[1] 88907 segmentation fault (core dumped) julia +nightly |
Aaah, right, when forwarding the blas library the number of threads is reset. |
I think we need two cases:
Any other combination of reproducers which doesn't involve the two above is probably not very useful to narrow down the issue, because it doesn't necessarily show much. |
@jishnub can you please test again https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz (same URL as before, but it's a new build, which includes OpenMathLib/OpenBLAS#4871)? It seems to work for me now: julia> using LinearAlgebra
julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037
julia> BLAS.set_num_threads(4)
julia> H = Hermitian(ones(1024, 1024));
julia> eigen(H);
julia> |
This works for me as well, thanks! |
A more minimal Julia reproducer is using LinearAlgebra
LAPACK.syevd!('V', 'U', ones(1024, 1024)); We're entering this function https://github.com/JuliaLang/julia/blob/e1aefebe1e3c62339be4b46043625170ec538137/stdlib/LinearAlgebra/src/lapack.jl#L5432 The lapack function being called is #include <stdio.h>
#include <stdlib.h>
int main() {
long n = 1024;
double *a;
a = (double*) malloc(n*n*sizeof(double));
openblas_set_num_threads64_(64);
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(long row=0; row<n; row++){
for(long col=0; col<=row; col++){
a[row*n + col] = 1.0;
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
long lda = n;
// Allocate work
double *work = (double*) malloc(sizeof(double));
long lwork = -1;
long *iwork = (long*) malloc(sizeof(long));
long liwork = -1;
long info = -100;
char jobz = 'V';
char uplo = 'U';
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
// Print eigenvalues
printf("First 5 Eigenvalues:\n");
for (int i = 0; i < 5; i++) {
printf("%f\n", w[i]);
}
free(a);
free(w);
return 0;
} But this doesn't crash for me $ gcc -o test test.c -L${HOME}/.julia/juliaup/julia-nightly/lib/julia -lopenblas64_ -Wl,-rpath,${HOME}/.julia/juliaup/julia-nightly/lib/julia
test.c: In function ‘main’:
test.c:9:5: warning: implicit declaration of function ‘openblas_set_num_threads64_’ [-Wimplicit-function-declaration]
openblas_set_num_threads64_(64);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c:46:5: warning: implicit declaration of function ‘dsyevd_64_’ [-Wimplicit-function-declaration]
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
^~~~~~~~~~
$ ./test
First 5 Eigenvalues:
0.000000
0.000000
0.000000
0.000000
0.000000 |
You have to call #include <stdio.h>
#include <stdlib.h>
int main() {
long n = 1024;
double *a;
a = (double*) malloc(n*n*sizeof(double));
openblas_set_num_threads64_(64);
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(long row=0; row<n; row++){
for(long col=0; col<=row; col++){
a[row*n + col] = 1.0;
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
long lda = n;
// workspace query
double *work = (double*) malloc(sizeof(double));
long lwork = -1;
long *iwork = (long*) malloc(sizeof(long));
long liwork = -1;
long info = -100;
char jobz = 'V';
char uplo = 'U';
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
// Workspace allocation
lwork = work[0];
work = (double *) malloc(lwork * sizeof(double));
liwork = iwork[0];
iwork = (long *) malloc(liwork * sizeof(long));
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
// Print eigenvalues
printf("First 5 Eigenvalues:\n");
for (int i = 0; i < 5; i++) {
printf("%f\n", w[i]);
}
return 0;
} |
Thanks. Sadly, adding lwork = (long)work[0];
work = (double*)realloc(work, lwork*sizeof(double));
liwork = (long)iwork[0];
iwork = (long*)realloc(iwork, liwork*sizeof(long));
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1L, 1L); after the first |
The bug is real, but it seems that Julia is just better at triggering it than C - whether it is our allocator or LBT or something else. |
The C script does crash for me with the second call to dsyevd on mac. |
Ah, interesting, I can confirm it segfaults for me as well on an M1 with OpenBLAS from Julia nightly, but not with OpenBLAS from Julia v1.10.4: % clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/repo/julia/usr/lib -Wl,-rpath,${HOME}/repo/julia/usr/lib -lopenblas64_
% ./test
zsh: segmentation fault ./test
% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -Wl,-rpath,${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -lopenblas64_
% ./test
First 5 Eigenvalues:
-0.000000
-0.000000
-0.000000
-0.000000
-0.000000 However it never segfaults for me on an x86_64-linux-gnu machine where I could reproduce the segfault in Julia |
Similarly,
Version info:
I suspect this is an OpenBLAS issue, as there is no error when using
MKL
.Some testing shows that this error happens from
64x64
matrices, and there is no error for smaller matrices.Probably JuliaLang/julia@9d222b8 is what caused this, as there is no error on the commit before this.
The text was updated successfully, but these errors were encountered: