Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in factorizations inside threaded loops #34500

Closed
iaravena opened this issue Jan 24, 2020 · 11 comments · Fixed by #34546
Closed

Segfault in factorizations inside threaded loops #34500

iaravena opened this issue Jan 24, 2020 · 11 comments · Fixed by #34546
Labels
bug Indicates an unexpected problem or unintended behavior linear algebra Linear algebra multithreading Base.Threads and related functionality sparse Sparse arrays

Comments

@iaravena
Copy link

iaravena commented Jan 24, 2020

As the tittle says, I am getting segfaults in Julia 1.3.1 (linux) when trying to factorize matrices inside threaded loops, i.e.

using LinearAlgebra, SparseArrays

function test(n::Integer)::Float64
    A = sprand(n, n, .2)
    x = qr(A) \ rand(n)
    return norm(x)
end

Threads.@threads for i = 1:100
  test(i+100)
end

results in a segfault whenever JULIA_NUM_THREADS>1. I am using

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 36

Same problem is observed with LU and other factorizations.

@andreasnoack andreasnoack added linear algebra Linear algebra multithreading Base.Threads and related functionality bug Indicates an unexpected problem or unintended behavior sparse Sparse arrays labels Jan 24, 2020
@andreasnoack
Copy link
Member

The sparse factorizations are probably not thread safe. They use a common C struct for all kinds of settings and allocations.

@iaravena
Copy link
Author

@andreasnoack It seems so. At least, there are no segfaults with dense matrices. Is there any plan to fix this behavior in the future or is this something inside LAPACK and out of the control of Julia?

@andreasnoack
Copy link
Member

The sparse factorizations are handled by SuiteSparse which generally uses a global struct. LAPACK is structured very differently and is less prone to these kinds of issues. I do think we can fix this in Julia but I'm not sure what the right solution is. Maybe @vtjnash can provide some hints to what a good solution would be.

@vtjnash
Copy link
Member

vtjnash commented Jan 25, 2020

IIRC, LAPACK used to have many of issues like that too (since Fortran encourages usage of global/static variables), but they got shaken out a few years ago. Maybe just needs someone to do the same for SuiteSparse?

@oscardssmith
Copy link
Member

I assume making this code be native Julia and as fast as lapack won't be our approach? I understand it would be rediculously hard, but otoh, it would be great for programs using linear algebra on non blas types

@ViralBShah
Copy link
Member

For non-blas types you often do not need to put in the same level of effort to get good performance. Getting. A KLU equivalent in Julia would be a good start.

We probably should file these issues upstream. It helps that suitesparse is now on GitHub.

@ViralBShah
Copy link
Member

The sparse factorizations are probably not thread safe. They use a common C struct for all kinds of settings and allocations.

Is there a way to get these into thread local storage?

@andreasnoack
Copy link
Member

Maybe we can ask @DrTimothyAldenDavis if there are any good resources for how to use SuiteSparse in multithreaded code.

@DrTimothyAldenDavis
Copy link

SuiteSparse is designed to handle this; all its functions are all thread-safe. There is a SuiteSparse_config global struct but it is not meant to be modified by multiple threads. It is meant to be initialized once (say when Julia starts) and then it is never modified after that. It has library-wide pointers to malloc, free, etc. So if there's a segfault, it's because Julia is using my routines incorrectly.

I have a few globals that exist if UMFPACK is compiled in debug mode, just for testing and diagnostics, but that requires editing the source code to enable. It can't be enabled with compile time flags, so it's not easy to turn on accidentally.

So this is a bug in Julia's interface. Can you point me to where you access the SuiteSparse_config global struct? I could then suggest a fix.

@DrTimothyAldenDavis
Copy link

I also never do any memory allocations that get placed in a global struct. You might be refering to the Cholmod_Common object, created here: https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/79f25b523ae0fd81abe1ffb0efd9005f2e6eef33/CHOLMOD/Core/cholmod_common.c#L55

But that is not a global extern. It is not meant for multiple user threads to use. Each thread that calls CHOLMOD (and thus the SuiteSparseQR) need their own Common object.

I do allocate memory and put it there, but it is thread-local, not global extern. If Julia is attempting to use the Common object for multiple threads, it will segfault. You have to give each thread its own.

@andreasnoack
Copy link
Member

Thanks for the explanation. It's indeed cholmod_common and not SuiteSparse_config that is causing the issue here. In the current wrappers, we only have a single global instance of cholmod_common. Our SuiteSparse wrappers predate multithreading in Julia which is why the issue shows up now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior linear algebra Linear algebra multithreading Base.Threads and related functionality sparse Sparse arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants