Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make USE_BLAS64 a runtime option? #891

Closed
simonbyrne opened this issue Dec 2, 2021 · 21 comments
Closed

Make USE_BLAS64 a runtime option? #891

simonbyrne opened this issue Dec 2, 2021 · 21 comments
Labels
external dependencies Involves LLVM, OpenBLAS, or other linked libraries

Comments

@simonbyrne
Copy link
Contributor

Now that we have libblastrampoline, we can switch BLAS libraries at runtime, but can't switch between using the ILP64 and LP64 interfaces: this can only be done by modifying the USE_BLAS64 build flag. Some BLAS libraries (notably, Apple's Accelerate) are only provided as LP64, see https://discourse.julialang.org/t/apple-m1-gpu-from-julia/69563

@perrutquist created a library SetBlasInt.jl, that attempts to do this by redefining methods, but would it be feasible to do something like this in Julia itself?

@simonbyrne simonbyrne added the external dependencies Involves LLVM, OpenBLAS, or other linked libraries label Dec 2, 2021
@perrutquist
Copy link
Contributor

One way of doing it would be via dispatch on an optional argument, so the type signature of a hypothetical BLAS function foo(a, b, c) would be extended to

foo(a, b, c, ::Type{BlasInt}=default_blas_int()) where {BlasInt <: Union{Int32, Int64}}

This would make it possible to choose the interface on each call, but also to change the global default (by re-defining the default_blas_int function).

@ViralBShah
Copy link
Member

ViralBShah commented Dec 2, 2021

I believe that Accelerate is already usable through LBT. We just need to have an MKL.jl like package for Accelerate. The real issue is always LAPACK, and we may need to ensure that LAPACK_jll builds and works reliably, since Apple's LAPACK is 10 years out of date the last I checked. I started the Yggdrasil package a while ago: https://github.com/JuliaPackaging/Yggdrasil/tree/master/L/LAPACK

Perhaps https://github.com/JuliaMath/AppleAccelerate.jl should be updated to work similar to MKL.jl.

@ViralBShah
Copy link
Member

OpenBLAS does have multi-threaded lu and chol, which you lose out in the Accelerate configuration, although in theory, we could just apply those openblas LAPACK overrides to our own LAPACK as well.

@giordano
Copy link
Contributor

giordano commented Dec 3, 2021

Note that to do this we'd need to make OpenBLAS32_jll a stdlib, and probably start also that one at LinearAlgebra loading time. I can see Linux distributions will be hating us more than they do already 😛

@simonbyrne
Copy link
Contributor Author

simonbyrne commented Dec 3, 2021

I believe that Accelerate is already usable through LBT. We just need to have an MKL.jl like package for Accelerate. The real issue is always LAPACK, and we may need to ensure that LAPACK_jll builds and works reliably, since Apple's LAPACK is 10 years out of date the last I checked. I started the Yggdrasil package a while ago: https://github.com/JuliaPackaging/Yggdrasil/tree/master/L/LAPACK

Accelerate only provides LP64 symbols (https://developer.apple.com/forums/thread/653440): you can add it to LBT, but from what I understand Julia will still use the ILP64 symbols provided by OpenBLAS:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate")
1705

julia> BLAS.lbt_get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libopenblas64_.0.3.13.dylib
└ [ LP64] Accelerate

@ViralBShah
Copy link
Member

I'm pretty sure we map them in the internal LBT table. @staticfloat or @giordano can say more.

@giordano
Copy link
Contributor

giordano commented Dec 3, 2021

Julia will still use the ILP64 symbols provided by OpenBLAS

As it is now, yes. With your proposal of having a way to switch ILP64 <-> LP64 at runtime LBT will forward the calls to the right library, because the calls would be different.

However, regarding packaging Julia, @haampie confirmed that having two variants of OpenBLAS in Spack would be, let's say, not easy.

@ViralBShah
Copy link
Member

ViralBShah commented Dec 3, 2021

MKL does not have ILP64 symbols either, but they do have an ILP64 API. I suppose Accelerate does not have an ILP64 API, which is why we need USE_BLAS64=0. I suppose it can't be too difficult to make that dynamic.

@staticfloat
Copy link
Member

I think @perrutquist's solution is roughly the right thing to do. LBT allows you to forward to different libraries no matter what they name their things, but it maintains its own mapping that preserves the property that gemm_64_ is always an ILP64 GEMM, and gemm_ is always an LP64 GEMM; so if Julia wants to call an LP64 BLAS, it needs to call gemm_, and if it calls gemm_64_ it better be expecting to pass ILP64 arguments.

Right now Julia's method of choosing which BLAS symbols to call is hardcoded; @perrutquist suggests one way of making it more dynamic, which I kind of like, although the fact that there is a semantic difference in some cases (e.g. when any(size(x, i) for i in 1:ndims(x)) > 2^31) makes me think that we might want to do some error checking when someone requests LP64 on a super long matrix.

@ViralBShah
Copy link
Member

Is it even worth asking Apple for an ILP64 API?

@simonbyrne
Copy link
Contributor Author

Is it even worth asking Apple for an ILP64 API?

Yes: I've opened a ticket, but they've said previously that they base their development priorities on the demand, so others should ask as well (and ask for a newer LAPACK while you're at it).

@staticfloat
Copy link
Member

Keno and I have asked our internal Apple developer contact as well, so they're aware that this is something we'd like.

@RoyiAvital
Copy link

I have been asking for this for months.
It is super important. When you solve this, don't leave behind the sparse matrices.
It would be great, performance wise, to have option for 32 Bit indices for Sparse Matrices as well. (Or at least a compile time choice).

@ViralBShah
Copy link
Member

@RoyiAvital You keep mentioning this in slack, discourse, and any time anyone discusses anything related to BLAS. We have heard you. But repeating it so many times does not really help.

@perrutquist
Copy link
Contributor

Regarding the need to provide two variants of OpenBLAS: If Julia would allow LP64 calls, then the documentation could warn the user to first install and load a library (and/or call some check_that_32bit_blas_is_loaded function).

LBT does not crash if an LP64 call is attempted without first loading a compatible library. It simply prints an error message and returns.

@perrutquist
Copy link
Contributor

perrutquist commented Dec 8, 2021

Another question would be what to do with structs like CholeskyPivoted that are defined in terms of BlasInt. (In this case, to store a vector of pivot indices.)

Should these structs have another type parameter, or should the LAPACK wrappers perform a conversion whenever the integer type used by Julia does not match the one used by LAPACK?

@ViralBShah
Copy link
Member

Yes those should receive another type parameter.

@zinphi
Copy link

zinphi commented Jul 25, 2022

Idea: wouldn‘t it be possible to dynamically route between ILP64 and LP64 within libblastrampoline? With dynamically I mean that LBT could decide by inspecting the integer arguments of the call whether it is necessary to invoke the presumingly loaded int32 or int64 BLAS variant: if any integer argument exceeds 2^31-1 LBT could fall back to the ILP64 variant (only if a LP64 variant is first choice). Usually only a few scalar integer values should need to be inspected (and maybe converted) for that which should be possible in no time. Explicit calls to routines with integer arrays as input arguments (not working arrays) may be problematic but they are probably an exception…

@giordano
Copy link
Contributor

giordano commented Jul 26, 2022

The idea is cool and probably not too hard to implement, but concatenating two strings at runtime to decide whether to call :dgemm or :dgemm64_ takes a non-negligible amount of time and it allocates memory:

julia> @btime Symbol($("dgemm"), $("64_"))
  56.877 ns (1 allocation: 32 bytes)
:dgemm64_

And I guess you'd also need to add the runtime check of whether there is a backing ILP64 or LP64 BLAS library. At the moment all these decisions are no-cost (well, they are hardcoded, aren't really decisions), your proposals adds this runtime cost for each call.

@staticfloat
Copy link
Member

staticfloat commented Jul 27, 2022

It's not possible to tell the difference between 1 valid 64-bit value, and 2 valid 32-bit values. When passing in indices for an array, how can you tell which way of looking at the set of indices is correct:

julia> idxs = UInt32[1, 0, 2, 0, 1, 0]
6-element Vector{UInt32}:
 0x00000001
 0x00000000
 0x00000002
 0x00000000
 0x00000001
 0x00000000

julia> reinterpret(UInt64, idxs)
3-element reinterpret(UInt64, ::Vector{UInt32}):
 0x0000000000000001
 0x0000000000000002
 0x0000000000000001

They both look completely valid, but will give wildly different answers.

@ViralBShah
Copy link
Member

Perhaps with ILP64 BLAS and LAPACK available everywhere now (Apple being the final holdout) and our LBT tooling, I think we may no longer need to put effort into this.

I'm suggesting we close this but happy to reopen if we feel like it would be still nice to have.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external dependencies Involves LLVM, OpenBLAS, or other linked libraries
Projects
None yet
Development

No branches or pull requests

7 participants