Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in Hermitian eigen/eigvals on nightly #1086

Closed
jishnub opened this issue Aug 12, 2024 · 30 comments · Fixed by JuliaLang/julia#55496
Closed

Segmentation fault in Hermitian eigen/eigvals on nightly #1086

jishnub opened this issue Aug 12, 2024 · 30 comments · Fixed by JuliaLang/julia#55496
Labels
bug Something isn't working regression 1.12 Regression in the 1.12 release rr trace included upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@jishnub
Copy link
Collaborator

jishnub commented Aug 12, 2024

julia> using LinearAlgebra

julia> H = Hermitian(rand(1024, 1024));

julia> eigen(H);
[1]    56499 segmentation fault (core dumped)  julia +nightly

Similarly,

julia> using LinearAlgebra

julia> H = Hermitian(rand(1024, 1024));

julia> eigvals(H);
[1]    56649 segmentation fault (core dumped)  julia +nightly

Version info:

julia> versioninfo()
Julia Version 1.12.0-DEV.1038
Commit 9d222b87d77 (2024-08-12 01:13 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = subl

I suspect this is an OpenBLAS issue, as there is no error when using MKL.

Some testing shows that this error happens from 64x64 matrices, and there is no error for smaller matrices.

Probably JuliaLang/julia@9d222b8 is what caused this, as there is no error on the commit before this.

@jishnub jishnub added bug Something isn't working regression 1.12 Regression in the 1.12 release labels Aug 12, 2024
@ViralBShah
Copy link
Member

ViralBShah commented Aug 12, 2024

We just bumped openblas yesterday, so it is quite likely an upstream bug.

On my mac I'm not getting a segmentation fault, but it seems to keep running endlessly. We should report this upstream.

@ViralBShah
Copy link
Member

Works fine for size 63x63 for me and stops working 64x64 onwards.

@jishnub
Copy link
Collaborator Author

jishnub commented Aug 13, 2024

For the record, I am being able to call dsyev in C using OpenBLAS's master branch when compiling with gcc directly.
E.g., using the following code:

#include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>

int main() {
    int n = 64;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(int row=0; row<n; row++){
        for(int col=0; col<=row; col++){
            a[row*n + col] = (double)rand()/(double)(RAND_MAX);
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    int lda = n;

    // Compute eigenvalues and eigenvectors
    LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);

    // Print eigenvalues
    printf("Eigenvalues:\n");
    for (int i = 0; i < n; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

Compiling with

gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -lopenblas

This runs correctly and produces results.

@jishnub
Copy link
Collaborator Author

jishnub commented Aug 13, 2024

I've noticed that this error occurs in julia when I set BLAS.set_num_threads(4) or higher. There is no segfault for 3 or fewer threads.

rr trace attached: https://julialang-dumps.s3.amazonaws.com/reports/2024-08-13T11-47-58-jishnub.tar.zst

@giordano
Copy link
Contributor

For the record, I am being able to call dsyev in C using OpenBLAS's master branch when compiling with gcc directly.

It'd be useful to make sure you can reproduce the segfault when linking to Julia's libopenblas build: if not, that test may not be useful.

@ViralBShah
Copy link
Member

Doesn't segfault when linking to our openblas with 1 or 4 threads. Used slightly modified driver below:

➜  lib git:(master) gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -L. -lopenblas64_ -I../include -Wno-implicit-function-declaration
ld: warning: search path 'openblasinstallation/lib' not found
ld: warning: reexported library with install name '@rpath/libunwind.1.dylib' found at '/Users/viral/julia/usr/lib/libunwind.1.0.dylib' couldn't be matched with any parent library and will be linked directly
➜  lib git:(master) OPENBLAS_NUM_THREADS=4 DYLD_LIBRARY_PATH=. ./hermeig
Eigenvalues:
0.000008
0.022556
0.036368
0.050084
#include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>

int main() {
    long n = 64;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(4);

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = (double)rand()/(double)(RAND_MAX);
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    long lda = n;

    // Compute eigenvalues and eigenvectors
    LAPACKE_dsyev64_(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);

    // Print eigenvalues
    printf("Eigenvalues:\n");
    for (int i = 0; i < n; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

@giordano
Copy link
Contributor

Then that doesn't seem to be a good reproducer 🙂

@ViralBShah
Copy link
Member

ViralBShah commented Aug 13, 2024

Or there could perhaps be a bug on the Julia side?

The LAPACKE interface seems to hide the lwork discovery.

@giordano
Copy link
Contributor

Or there could perhaps be a bug on the Julia side?

Perhaps, but I'd spend more time to make sure the C reproducer is 100% faithful to what we're doing on the Julia side. Also, if you want to build OpenBLAS locally, to reproduce the Yggdrasil build I'd suggest compiling OpenBLAS with something like

make USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=400 NO_AFFINITY=1 NO_STATIC=1 NUM_THREADS=512 INTERFACE64=1 SYMBOLSUFFIX=64_ DYNAMIC_ARCH=1 TARGET=GENERIC all

@giordano
Copy link
Contributor

Also, do we have a backtrace with gdb?

@ViralBShah
Copy link
Member

ViralBShah commented Aug 13, 2024

@jishnub The default algorithm seems to call syevd and not syev, based on the docs. But all the 3 algorithms are failing. :-(

@jishnub
Copy link
Collaborator Author

jishnub commented Aug 13, 2024

The segfault happens for all the algorithms

@ViralBShah
Copy link
Member

ViralBShah commented Aug 13, 2024

svd also fails. The failures are only on Hermitian matrices 64x64 and larger. Other factorizations seem to be working fine on Hermitians.

Of course svd failing may not be additional data, in that it is probably using the same internal LAPACK routines.

@ViralBShah
Copy link
Member

ViralBShah commented Aug 13, 2024

@giordano I could try get a gdb trace, but on M1 mac, it just goes into an infinite loop (should I ctrl-C?) - not a segfault. @jishnub Which platform are you seeing the segfault on?

@giordano
Copy link
Contributor

giordano commented Aug 13, 2024

Also, do we have a backtrace with gdb?

julia> eigen(H);

Thread 9 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd23f9640 (LWP 3380628)]
0x00007fffde86df6c in dgemm_itcopy_ZEN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
(gdb) where
#0  0x00007fffde86df6c in dgemm_itcopy_ZEN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#1  0x00007fffdda5f70f in dsyr2k_UN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#2  0x00007fffddb99cd8 in exec_threads () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#3  0x00007fffddb99efc in blas_thread_server () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#4  0x00007ffff7e08ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
JuliaLang/julia#5  0x00007ffff7e9a850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

So, it looks to me the segmentation fault is in OpenBLAS. Certainly the issue could be in what Julia passes to OpenBLAS, but this again begs for a more faithful C reproducer.

@jishnub
Copy link
Collaborator Author

jishnub commented Aug 13, 2024

x86_64-linux-gnu

@ViralBShah
Copy link
Member

ViralBShah commented Aug 13, 2024

So, if I set BLAS num threads to 1, then no crash on the Hermitian 64x64.

julia> H =Hermitian(rand(64,64))

julia> BLAS.set_num_threads(4)

julia> eigen(H)
^C^C^C^C^CWARNING: Force throwing a SIGINT
^C^C^C^C^CERROR: ^C^C^C^C^C^CInterruptException:

julia> ^C

julia> BLAS.set_num_threads(1)

julia> eigen(H)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
64-element Vector{Float64}:
 -4.105758427499317
 -3.971213922931517

@ViralBShah
Copy link
Member

ViralBShah commented Aug 13, 2024

I believe a more faithful reproducer using the direct fortran API calling the workspace query, which does not crash:

➜  lib git:(vs/55471) ✗ gcc -g -o hermeig hermeig.c -L. -lopenblas64_ -I../include -Wno-implicit-function-declaration && DYLD_LIBRARY_PATH=. ./hermeig
ld: warning: reexported library with install name '@rpath/libunwind.1.dylib' found at '/Users/viral/julia/usr/lib/libunwind.1.0.dylib' couldn't be matched with any parent library and will be linked directly
First 5 Eigenvalues:
-4.396484
-4.053237
-3.783926
-3.699819
-3.197730
#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 64;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(4);

    // Check if the memory has been successfully                                                                                                     
    // allocated by malloc or not                                                                                                                    
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = (double)rand()/(double)(RAND_MAX);
        }
    }

    // Allocate space for eigenvalues                                                                                                                
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix                                                                                                               
    long lda = n;

    // Allocate work                                                                                                                                 
    long lwork = -1;
    double *work = (double*) malloc(sizeof(double));
    long info = -100;

    char nchar = 'N';
    char uchar = 'U';

    // Compute eigenvalues and eigenvectors                                                                                                          
    dsyev_64_(&nchar, &uchar, &n, a, &lda, w, work, &lwork, &info);
    lwork = (long)work[0];
    work = (double*)realloc(work, lwork*sizeof(double));
    dsyev_64_(&nchar, &uchar, &n, a, &lda, w, work, &lwork, &info);

    // Print eigenvalues                                                                                                                             
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

@giordano
Copy link
Contributor

@jishnub can you try with https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz?

I get no crashes:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia>

This was compiled with GCC 12. As I mentioned in OpenMathLib/OpenBLAS#4868 (comment), the segmentation fault appears to go away depending on the version of GCC used (but I can't tell whether it's a bug in OpenBLAS which is hidden/surfaced by some compiler versions, or a genuine compiler error)

@jishnub
Copy link
Collaborator Author

jishnub commented Aug 13, 2024

Oddly, I seem to still face the issue using this:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia> BLAS.get_num_threads()
1

julia> BLAS.set_num_threads(4)

julia> eigen(H);

[88907] signal 11 (1): Segmentation fault

[88907] signal 11 (1): Segmentation fault
in expression starting at REPL[7]:1
Allocations: 2069146 (Pool: 2069051;Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
[1]    88907 segmentation fault (core dumped)  julia +nightly

@giordano
Copy link
Contributor

Aaah, right, when forwarding the blas library the number of threads is reset.

@giordano
Copy link
Contributor

which does not crash:

I think we need two cases:

  1. a code which linked to julia's build of openblas crashes in the same way as within julia
  2. the exact same code which doesn't crash when linked to a different build of openblas, but please provide all information of compiler used and build configuration

Any other combination of reproducers which doesn't involve the two above is probably not very useful to narrow down the issue, because it doesn't necessarily show much.

@giordano giordano added the upstream The issue is with an upstream dependency, e.g. LLVM label Aug 13, 2024
@giordano
Copy link
Contributor

@jishnub can you please test again https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz (same URL as before, but it's a new build, which includes OpenMathLib/OpenBLAS#4871)? It seems to work for me now:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037

julia> BLAS.set_num_threads(4)

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia>

@jishnub
Copy link
Collaborator Author

jishnub commented Aug 15, 2024

This works for me as well, thanks!

@giordano
Copy link
Contributor

giordano commented Aug 15, 2024

A more minimal Julia reproducer is

using LinearAlgebra
LAPACK.syevd!('V', 'U', ones(1024, 1024));

We're entering this function https://github.com/JuliaLang/julia/blob/e1aefebe1e3c62339be4b46043625170ec538137/stdlib/LinearAlgebra/src/lapack.jl#L5432 The lapack function being called is DSYEVD. Can someone please confirm this is a sensible program (NOTE: I updated the code compared the a previous version of this message):

#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 1024;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(64);

    // Check if the memory has been successfully                                                                                                     
    // allocated by malloc or not                                                                                                                    
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = 1.0;
        }
    }

    // Allocate space for eigenvalues                                                                                                                
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix                                                                                                               
    long lda = n;

    // Allocate work                                                                                                                                 
    double *work = (double*) malloc(sizeof(double));
    long lwork = -1;
    long *iwork = (long*) malloc(sizeof(long));
    long liwork = -1;
    long info = -100;

    char jobz = 'V';
    char uplo = 'U';

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Print eigenvalues                                                                                                                             
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    free(a);
    free(w);

    return 0;
}

But this doesn't crash for me

$ gcc -o test test.c -L${HOME}/.julia/juliaup/julia-nightly/lib/julia -lopenblas64_ -Wl,-rpath,${HOME}/.julia/juliaup/julia-nightly/lib/julia
test.c: In function ‘main’:
test.c:9:5: warning: implicit declaration of function ‘openblas_set_num_threads64_’ [-Wimplicit-function-declaration]
     openblas_set_num_threads64_(64);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c:46:5: warning: implicit declaration of function ‘dsyevd_64_’ [-Wimplicit-function-declaration]
     dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
     ^~~~~~~~~~
$ ./test
First 5 Eigenvalues:
0.000000
0.000000
0.000000
0.000000
0.000000

@ViralBShah
Copy link
Member

ViralBShah commented Aug 15, 2024

You have to call dsyevd twice. First time for the workspace query. Then take that value and allocate the work array and then actually call it again. It should segfault in the second call where the computation will actually happen.

#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 1024;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(64);

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = 1.0;
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    long lda = n;

    // workspace query
    double *work = (double*) malloc(sizeof(double));
    long lwork = -1;
    long *iwork = (long*) malloc(sizeof(long));
    long liwork = -1;
    long info = -100;

    char jobz = 'V';
    char uplo = 'U';

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Workspace allocation
    lwork = work[0];
    work = (double *) malloc(lwork * sizeof(double));
    liwork = iwork[0];
    iwork = (long *) malloc(liwork * sizeof(long));

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Print eigenvalues
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    return 0;
}

@giordano
Copy link
Contributor

Thanks. Sadly, adding

    lwork = (long)work[0];
    work = (double*)realloc(work, lwork*sizeof(double));
    liwork = (long)iwork[0];
    iwork = (long*)realloc(iwork, liwork*sizeof(long));
    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1L, 1L);

after the first dsyevd_64_ call doesn't seem to change the outcome 😞 However it looks like OpenMathLib/OpenBLAS#4879 works for me on the machines where I was still observing a segmentation fault after the first patch. Hopefully we'll be good after that, but I'm still puzzled by the fact we can't prepare a C reproducer. I also tried to use dlopen/dlsym, to even more closely match what we do in Julia, to no avail.

@ViralBShah
Copy link
Member

The bug is real, but it seems that Julia is just better at triggering it than C - whether it is our allocator or LBT or something else.

@ViralBShah
Copy link
Member

The C script does crash for me with the second call to dsyevd on mac.

@giordano
Copy link
Contributor

The C script does crash for me with the second call to dsyevd on mac.

Ah, interesting, I can confirm it segfaults for me as well on an M1 with OpenBLAS from Julia nightly, but not with OpenBLAS from Julia v1.10.4:

% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/repo/julia/usr/lib -Wl,-rpath,${HOME}/repo/julia/usr/lib -lopenblas64_
% ./test
zsh: segmentation fault  ./test
% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -Wl,-rpath,${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -lopenblas64_
% ./test
First 5 Eigenvalues:
-0.000000
-0.000000
-0.000000
-0.000000
-0.000000

However it never segfaults for me on an x86_64-linux-gnu machine where I could reproduce the segfault in Julia

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression 1.12 Regression in the 1.12 release rr trace included upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants