Make USE_BLAS64 a runtime option? #891

simonbyrne · 2021-12-02T18:29:58Z

Now that we have libblastrampoline, we can switch BLAS libraries at runtime, but can't switch between using the ILP64 and LP64 interfaces: this can only be done by modifying the USE_BLAS64 build flag. Some BLAS libraries (notably, Apple's Accelerate) are only provided as LP64, see https://discourse.julialang.org/t/apple-m1-gpu-from-julia/69563

@perrutquist created a library SetBlasInt.jl, that attempts to do this by redefining methods, but would it be feasible to do something like this in Julia itself?

The text was updated successfully, but these errors were encountered:

perrutquist · 2021-12-02T19:20:18Z

One way of doing it would be via dispatch on an optional argument, so the type signature of a hypothetical BLAS function foo(a, b, c) would be extended to

foo(a, b, c, ::Type{BlasInt}=default_blas_int()) where {BlasInt <: Union{Int32, Int64}}

This would make it possible to choose the interface on each call, but also to change the global default (by re-defining the default_blas_int function).

ViralBShah · 2021-12-02T23:21:22Z

I believe that Accelerate is already usable through LBT. We just need to have an MKL.jl like package for Accelerate. The real issue is always LAPACK, and we may need to ensure that LAPACK_jll builds and works reliably, since Apple's LAPACK is 10 years out of date the last I checked. I started the Yggdrasil package a while ago: https://github.com/JuliaPackaging/Yggdrasil/tree/master/L/LAPACK

Perhaps https://github.com/JuliaMath/AppleAccelerate.jl should be updated to work similar to MKL.jl.

ViralBShah · 2021-12-02T23:26:04Z

OpenBLAS does have multi-threaded lu and chol, which you lose out in the Accelerate configuration, although in theory, we could just apply those openblas LAPACK overrides to our own LAPACK as well.

giordano · 2021-12-03T10:22:22Z

Note that to do this we'd need to make OpenBLAS32_jll a stdlib, and probably start also that one at LinearAlgebra loading time. I can see Linux distributions will be hating us more than they do already 😛

simonbyrne · 2021-12-03T14:47:02Z

I believe that Accelerate is already usable through LBT. We just need to have an MKL.jl like package for Accelerate. The real issue is always LAPACK, and we may need to ensure that LAPACK_jll builds and works reliably, since Apple's LAPACK is 10 years out of date the last I checked. I started the Yggdrasil package a while ago: https://github.com/JuliaPackaging/Yggdrasil/tree/master/L/LAPACK

Accelerate only provides LP64 symbols (https://developer.apple.com/forums/thread/653440): you can add it to LBT, but from what I understand Julia will still use the ILP64 symbols provided by OpenBLAS:

julia> using LinearAlgebra

julia> BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate")
1705

julia> BLAS.lbt_get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libopenblas64_.0.3.13.dylib
└ [ LP64] Accelerate

ViralBShah · 2021-12-03T15:03:03Z

I'm pretty sure we map them in the internal LBT table. @staticfloat or @giordano can say more.

giordano · 2021-12-03T15:06:34Z

Julia will still use the ILP64 symbols provided by OpenBLAS

As it is now, yes. With your proposal of having a way to switch ILP64 <-> LP64 at runtime LBT will forward the calls to the right library, because the calls would be different.

However, regarding packaging Julia, @haampie confirmed that having two variants of OpenBLAS in Spack would be, let's say, not easy.

ViralBShah · 2021-12-03T15:08:36Z

MKL does not have ILP64 symbols either, but they do have an ILP64 API. I suppose Accelerate does not have an ILP64 API, which is why we need USE_BLAS64=0. I suppose it can't be too difficult to make that dynamic.

staticfloat · 2021-12-03T16:56:48Z

I think @perrutquist's solution is roughly the right thing to do. LBT allows you to forward to different libraries no matter what they name their things, but it maintains its own mapping that preserves the property that gemm_64_ is always an ILP64 GEMM, and gemm_ is always an LP64 GEMM; so if Julia wants to call an LP64 BLAS, it needs to call gemm_, and if it calls gemm_64_ it better be expecting to pass ILP64 arguments.

Right now Julia's method of choosing which BLAS symbols to call is hardcoded; @perrutquist suggests one way of making it more dynamic, which I kind of like, although the fact that there is a semantic difference in some cases (e.g. when any(size(x, i) for i in 1:ndims(x)) > 2^31) makes me think that we might want to do some error checking when someone requests LP64 on a super long matrix.

ViralBShah · 2021-12-03T18:56:39Z

Is it even worth asking Apple for an ILP64 API?

simonbyrne · 2021-12-03T21:52:09Z

Is it even worth asking Apple for an ILP64 API?

Yes: I've opened a ticket, but they've said previously that they base their development priorities on the demand, so others should ask as well (and ask for a newer LAPACK while you're at it).

staticfloat · 2021-12-03T22:44:32Z

Keno and I have asked our internal Apple developer contact as well, so they're aware that this is something we'd like.

RoyiAvital · 2021-12-04T12:12:14Z

I have been asking for this for months.
It is super important. When you solve this, don't leave behind the sparse matrices.
It would be great, performance wise, to have option for 32 Bit indices for Sparse Matrices as well. (Or at least a compile time choice).

ViralBShah · 2021-12-04T15:51:06Z

@RoyiAvital You keep mentioning this in slack, discourse, and any time anyone discusses anything related to BLAS. We have heard you. But repeating it so many times does not really help.

perrutquist · 2021-12-06T07:18:37Z

Regarding the need to provide two variants of OpenBLAS: If Julia would allow LP64 calls, then the documentation could warn the user to first install and load a library (and/or call some check_that_32bit_blas_is_loaded function).

LBT does not crash if an LP64 call is attempted without first loading a compatible library. It simply prints an error message and returns.

perrutquist · 2021-12-08T09:05:49Z

Another question would be what to do with structs like CholeskyPivoted that are defined in terms of BlasInt. (In this case, to store a vector of pivot indices.)

Should these structs have another type parameter, or should the LAPACK wrappers perform a conversion whenever the integer type used by Julia does not match the one used by LAPACK?

ViralBShah · 2021-12-08T13:02:29Z

Yes those should receive another type parameter.

zinphi · 2022-07-25T20:57:15Z

Idea: wouldn‘t it be possible to dynamically route between ILP64 and LP64 within libblastrampoline? With dynamically I mean that LBT could decide by inspecting the integer arguments of the call whether it is necessary to invoke the presumingly loaded int32 or int64 BLAS variant: if any integer argument exceeds 2^31-1 LBT could fall back to the ILP64 variant (only if a LP64 variant is first choice). Usually only a few scalar integer values should need to be inspected (and maybe converted) for that which should be possible in no time. Explicit calls to routines with integer arrays as input arguments (not working arrays) may be problematic but they are probably an exception…

giordano · 2022-07-26T00:00:33Z

The idea is cool and probably not too hard to implement, but concatenating two strings at runtime to decide whether to call :dgemm or :dgemm64_ takes a non-negligible amount of time and it allocates memory:

julia> @btime Symbol($("dgemm"), $("64_"))
  56.877 ns (1 allocation: 32 bytes)
:dgemm64_

And I guess you'd also need to add the runtime check of whether there is a backing ILP64 or LP64 BLAS library. At the moment all these decisions are no-cost (well, they are hardcoded, aren't really decisions), your proposals adds this runtime cost for each call.

staticfloat · 2022-07-27T15:36:29Z

It's not possible to tell the difference between 1 valid 64-bit value, and 2 valid 32-bit values. When passing in indices for an array, how can you tell which way of looking at the set of indices is correct:

julia> idxs = UInt32[1, 0, 2, 0, 1, 0]
6-element Vector{UInt32}:
 0x00000001
 0x00000000
 0x00000002
 0x00000000
 0x00000001
 0x00000000

julia> reinterpret(UInt64, idxs)
3-element reinterpret(UInt64, ::Vector{UInt32}):
 0x0000000000000001
 0x0000000000000002
 0x0000000000000001

They both look completely valid, but will give wildly different answers.

ViralBShah · 2023-05-24T21:31:55Z

Perhaps with ILP64 BLAS and LAPACK available everywhere now (Apple being the final holdout) and our LBT tooling, I think we may no longer need to put effort into this.

I'm suggesting we close this but happy to reopen if we feel like it would be still nice to have.

simonbyrne added the external dependencies Involves LLVM, OpenBLAS, or other linked libraries label Dec 2, 2021

giordano mentioned this issue Jul 15, 2022

Julia OpenBLAS doesn't scale to two sockets carstenbauer/julia-dgemm-noctua#2

Closed

ViralBShah mentioned this issue Aug 8, 2022

Detect interface type for LAPACK JuliaLinearAlgebra/libblastrampoline#35

Closed

ViralBShah mentioned this issue Sep 5, 2022

BLAS support for M1 ARM64 via Apple's Accelerate #869

Closed

amontoison mentioned this issue Nov 29, 2022

Error: no BLAS/LAPACK library loaded! JuliaLinearAlgebra/libblastrampoline#94

Closed

ViralBShah closed this as completed May 24, 2023

KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make USE_BLAS64 a runtime option? #891

Make USE_BLAS64 a runtime option? #891

simonbyrne commented Dec 2, 2021

perrutquist commented Dec 2, 2021

ViralBShah commented Dec 2, 2021 •

edited

Loading

ViralBShah commented Dec 2, 2021

giordano commented Dec 3, 2021

simonbyrne commented Dec 3, 2021 •

edited

Loading

ViralBShah commented Dec 3, 2021

giordano commented Dec 3, 2021

ViralBShah commented Dec 3, 2021 •

edited

Loading

staticfloat commented Dec 3, 2021

ViralBShah commented Dec 3, 2021

simonbyrne commented Dec 3, 2021

staticfloat commented Dec 3, 2021

RoyiAvital commented Dec 4, 2021

ViralBShah commented Dec 4, 2021

perrutquist commented Dec 6, 2021

perrutquist commented Dec 8, 2021 •

edited

Loading

ViralBShah commented Dec 8, 2021

zinphi commented Jul 25, 2022

giordano commented Jul 26, 2022 •

edited

Loading

staticfloat commented Jul 27, 2022 •

edited

Loading

ViralBShah commented May 24, 2023

Make USE_BLAS64 a runtime option? #891

Make USE_BLAS64 a runtime option? #891

Comments

simonbyrne commented Dec 2, 2021

perrutquist commented Dec 2, 2021

ViralBShah commented Dec 2, 2021 • edited Loading

ViralBShah commented Dec 2, 2021

giordano commented Dec 3, 2021

simonbyrne commented Dec 3, 2021 • edited Loading

ViralBShah commented Dec 3, 2021

giordano commented Dec 3, 2021

ViralBShah commented Dec 3, 2021 • edited Loading

staticfloat commented Dec 3, 2021

ViralBShah commented Dec 3, 2021

simonbyrne commented Dec 3, 2021

staticfloat commented Dec 3, 2021

RoyiAvital commented Dec 4, 2021

ViralBShah commented Dec 4, 2021

perrutquist commented Dec 6, 2021

perrutquist commented Dec 8, 2021 • edited Loading

ViralBShah commented Dec 8, 2021

zinphi commented Jul 25, 2022

giordano commented Jul 26, 2022 • edited Loading

staticfloat commented Jul 27, 2022 • edited Loading

ViralBShah commented May 24, 2023

ViralBShah commented Dec 2, 2021 •

edited

Loading

simonbyrne commented Dec 3, 2021 •

edited

Loading

ViralBShah commented Dec 3, 2021 •

edited

Loading

perrutquist commented Dec 8, 2021 •

edited

Loading

giordano commented Jul 26, 2022 •

edited

Loading

staticfloat commented Jul 27, 2022 •

edited

Loading