Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LBT to forward BLAS and LAPACK calls to Accelerate #58

Merged
merged 3 commits into from
May 12, 2023

Conversation

staticfloat
Copy link
Member

@staticfloat staticfloat commented Apr 12, 2023

This throws away most of the previous version, instead opting to re-architect this package to make use of LBT to transparently use Accelerate for BLAS and LAPACK operations. Further enhancements to re-introduce the DSP functionality can be made, potentially in a separate package if we want to keep this one lightweight, as it may end up at the bottom of many dependency trees.

This re-architecting causes Accelerate to pass the full LinearAlgebra test suite (thanks to the usage of an external LAPACK_jll to paper over bugs in dsptrf(); hopefully no longer necessary in a future macOS update).

Fixes #45

@staticfloat
Copy link
Member Author

This will naturally fail CI on any macOS older than 13.3

Anecdotally, Accelerate on my M1 Pro runs the LinearAlgebra test suite pretty quickly:

Running parallel tests with:
  nworkers() = 8
  nthreads() = 1
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 16.000 GiB
  Sys.free_memory() = 1.479 GiB

Test                          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
LinearAlgebra/bidiag               (9) |        started at 2023-04-12T12:13:15.792
LinearAlgebra/diagonal             (7) |        started at 2023-04-12T12:13:15.839
LinearAlgebra/special              (8) |        started at 2023-04-12T12:13:15.883
LinearAlgebra/symmetric            (6) |        started at 2023-04-12T12:13:15.883
LinearAlgebra/triangular           (3) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/addmul               (2) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/matmul               (4) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/dense                (5) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/special              (8) |    97.45 |   2.87 |  2.9 |   13718.53 |   875.08
LinearAlgebra/qr                   (8) |        started at 2023-04-12T12:14:53.531
LinearAlgebra/bidiag               (9) |   106.62 |   3.55 |  3.3 |   13216.62 |  1039.58
LinearAlgebra/cholesky             (9) |        started at 2023-04-12T12:15:02.584
LinearAlgebra/dense                (5) |   142.17 |   5.91 |  4.2 |   17388.80 |  1441.88
LinearAlgebra/blas                 (5) |        started at 2023-04-12T12:15:38.173
LinearAlgebra/diagonal             (7) |   147.45 |   6.44 |  4.4 |   18475.31 |  1209.39
LinearAlgebra/lu                   (7) |        started at 2023-04-12T12:15:43.451
LinearAlgebra/qr                   (8) |    54.92 |   2.56 |  4.7 |    6552.89 |   945.16
LinearAlgebra/uniformscaling       (8) |        started at 2023-04-12T12:15:48.463
LinearAlgebra/cholesky             (9) |    54.48 |   3.47 |  6.4 |    5250.13 |  1039.58
LinearAlgebra/structuredbroadcast  (9) |        started at 2023-04-12T12:15:57.069
LinearAlgebra/addmul               (2) |   163.83 |   4.79 |  2.9 |   15629.88 |   625.73
LinearAlgebra/hessenberg           (2) |        started at 2023-04-12T12:15:59.829
LinearAlgebra/symmetric            (6) |   169.04 |   6.57 |  3.9 |   18915.86 |  1102.09
LinearAlgebra/svd                  (6) |        started at 2023-04-12T12:16:05.030
LinearAlgebra/matmul               (4) |   175.18 |   6.65 |  3.8 |   21358.36 |   831.70
LinearAlgebra/eigen                (4) |        started at 2023-04-12T12:16:11.197
LinearAlgebra/blas                 (5) |    33.83 |   2.18 |  6.4 |    2384.33 |  1441.88
LinearAlgebra/tridiag              (5) |        started at 2023-04-12T12:16:12.010
LinearAlgebra/structuredbroadcast  (9) |    31.07 |   3.32 | 10.7 |    2900.56 |  1039.58
LinearAlgebra/lapack               (9) |        started at 2023-04-12T12:16:28.183
LinearAlgebra/uniformscaling       (8) |    47.64 |   3.37 |  7.1 |    3557.69 |  1105.19
LinearAlgebra/lq                   (8) |        started at 2023-04-12T12:16:36.137
LinearAlgebra/hessenberg           (2) |    47.76 |   3.17 |  6.6 |    3818.54 |   712.42
LinearAlgebra/adjtrans             (2) |        started at 2023-04-12T12:16:47.611
LinearAlgebra/svd                  (6) |    44.37 |   5.12 | 11.5 |    3351.13 |  1102.09
LinearAlgebra/generic              (6) |        started at 2023-04-12T12:16:49.421
LinearAlgebra/lapack               (9) |    28.29 |   2.55 |  9.0 |    1628.27 |  1039.58
LinearAlgebra/schur                (9) |        started at 2023-04-12T12:16:56.510
LinearAlgebra/tridiag              (5) |    47.93 |   4.71 |  9.8 |    2726.56 |  1441.88
LinearAlgebra/bunchkaufman         (5) |        started at 2023-04-12T12:16:59.965
LinearAlgebra/lq                   (8) |    33.69 |   3.54 | 10.5 |    1793.97 |  1105.19
LinearAlgebra/givens               (8) |        started at 2023-04-12T12:17:09.844
LinearAlgebra/lu                   (7) |    94.47 |  10.87 | 11.5 |    5976.82 |  1209.39
LinearAlgebra/pinv                 (7) |        started at 2023-04-12T12:17:17.950
LinearAlgebra/adjtrans             (2) |    31.05 |   3.38 | 10.9 |    2257.61 |   728.20
LinearAlgebra/factorization        (2) |        started at 2023-04-12T12:17:18.677
LinearAlgebra/eigen                (4) |    68.53 |   7.26 | 10.6 |    4228.16 |   831.70
LinearAlgebra/abstractq            (4) |        started at 2023-04-12T12:17:19.739
LinearAlgebra/givens               (8) |    10.21 |   1.82 | 17.8 |     397.91 |  1105.19
LinearAlgebra/ldlt                 (8) |        started at 2023-04-12T12:17:20.074
LinearAlgebra/ldlt                 (8) |     1.06 |   0.00 |  0.0 |      61.72 |  1105.19
LinearAlgebra/factorization        (2) |     4.06 |   0.49 | 12.0 |     304.59 |   815.39
LinearAlgebra/abstractq            (4) |     3.86 |   0.24 |  6.2 |     331.80 |   913.75
LinearAlgebra/bunchkaufman         (5) |    23.79 |   2.78 | 11.7 |    1370.65 |  1441.88
LinearAlgebra/pinv                 (7) |     6.97 |   0.75 | 10.7 |     855.20 |  1428.39
LinearAlgebra/generic              (6) |    38.02 |   3.72 |  9.8 |    2491.66 |  1226.55
LinearAlgebra/schur                (9) |    84.99 |   1.83 |  2.2 |    1404.24 |  1039.58
LinearAlgebra/triangular           (3) |   306.81 |  18.81 |  6.1 |   33163.34 |  2196.91

Test Summary: |  Pass  Broken  Total     Time
  Overall     | 96483      17  96500  5m08.2s
    SUCCESS
Test Summary:                 |   Time
Full LinearAlgebra test suite | None  5m12.7s
     Testing AppleAccelerate tests passed

Versus OpenBLAS:

Running parallel tests with:
  nworkers() = 8
  nthreads() = 1
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 16.000 GiB
  Sys.free_memory() = 2.436 GiB

Test                          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
LinearAlgebra/diagonal             (7) |        started at 2023-04-12T12:19:43.415
LinearAlgebra/special              (8) |        started at 2023-04-12T12:19:43.479
LinearAlgebra/matmul               (4) |        started at 2023-04-12T12:19:43.582
LinearAlgebra/addmul               (2) |        started at 2023-04-12T12:19:43.582
LinearAlgebra/triangular           (3) |        started at 2023-04-12T12:19:43.583
LinearAlgebra/symmetric            (6) |        started at 2023-04-12T12:19:43.583
LinearAlgebra/dense                (5) |        started at 2023-04-12T12:19:43.583
LinearAlgebra/bidiag               (9) |        started at 2023-04-12T12:19:43.586
LinearAlgebra/special              (8) |   101.46 |   3.33 |  3.3 |   13718.61 |   917.61
LinearAlgebra/qr                   (8) |        started at 2023-04-12T12:21:25.276
LinearAlgebra/bidiag               (9) |   114.20 |   3.99 |  3.5 |   13216.68 |   956.81
LinearAlgebra/cholesky             (9) |        started at 2023-04-12T12:21:37.868
LinearAlgebra/dense                (5) |   145.96 |   5.71 |  3.9 |   17388.89 |  1106.66
LinearAlgebra/blas                 (5) |        started at 2023-04-12T12:22:09.665
LinearAlgebra/diagonal             (7) |   158.96 |   7.93 |  5.0 |   18475.02 |  1177.16
LinearAlgebra/lu                   (7) |        started at 2023-04-12T12:22:22.653
LinearAlgebra/qr                   (8) |    57.88 |   3.25 |  5.6 |    6552.97 |   990.30
LinearAlgebra/uniformscaling       (8) |        started at 2023-04-12T12:22:23.163
LinearAlgebra/cholesky             (9) |    56.94 |   3.20 |  5.6 |    5249.92 |   979.05
LinearAlgebra/structuredbroadcast  (9) |        started at 2023-04-12T12:22:34.828
LinearAlgebra/symmetric            (6) |   175.63 |   7.43 |  4.2 |   18916.03 |  1136.31
LinearAlgebra/hessenberg           (6) |        started at 2023-04-12T12:22:39.359
LinearAlgebra/blas                 (5) |    34.75 |   2.57 |  7.4 |    2384.31 |  1226.16
LinearAlgebra/svd                  (5) |        started at 2023-04-12T12:22:44.459
LinearAlgebra/matmul               (4) |   183.24 |   7.50 |  4.1 |   21417.15 |   749.80
LinearAlgebra/eigen                (4) |        started at 2023-04-12T12:22:46.928
LinearAlgebra/structuredbroadcast  (9) |    33.27 |   3.61 | 10.9 |    2900.77 |   979.05
LinearAlgebra/tridiag              (9) |        started at 2023-04-12T12:23:08.125
LinearAlgebra/hessenberg           (6) |    32.13 |   2.63 |  8.2 |    2461.15 |  1136.31
LinearAlgebra/lapack               (6) |        started at 2023-04-12T12:23:11.506
LinearAlgebra/uniformscaling       (8) |    48.48 |   3.70 |  7.6 |    3557.68 |  1007.66
LinearAlgebra/lq                   (8) |        started at 2023-04-12T12:23:11.649
LinearAlgebra/svd                  (5) |    44.31 |   3.41 |  7.7 |    2903.87 |  1226.16
LinearAlgebra/adjtrans             (5) |        started at 2023-04-12T12:23:28.782
LinearAlgebra/lapack               (6) |    24.98 |   2.24 |  9.0 |    1414.08 |  1136.31
LinearAlgebra/generic              (6) |        started at 2023-04-12T12:23:36.518
LinearAlgebra/lq                   (8) |    30.77 |   3.02 |  9.8 |    1794.01 |  1007.66
LinearAlgebra/schur                (8) |        started at 2023-04-12T12:23:42.448
LinearAlgebra/tridiag              (9) |    40.00 |   3.86 |  9.7 |    2215.41 |   979.05
LinearAlgebra/bunchkaufman         (9) |        started at 2023-04-12T12:23:48.155
LinearAlgebra/eigen                (4) |    63.84 |   6.05 |  9.5 |    4228.25 |   749.80
LinearAlgebra/givens               (4) |        started at 2023-04-12T12:23:50.788
LinearAlgebra/lu                   (7) |    96.01 |  10.01 | 10.4 |    5976.80 |  1177.16
LinearAlgebra/pinv                 (7) |        started at 2023-04-12T12:23:58.674
LinearAlgebra/givens               (4) |     8.93 |   0.79 |  8.9 |     498.21 |   749.80
LinearAlgebra/factorization        (4) |        started at 2023-04-12T12:23:59.746
LinearAlgebra/adjtrans             (5) |    32.41 |   3.09 |  9.5 |    1977.57 |  1226.16
LinearAlgebra/abstractq            (5) |        started at 2023-04-12T12:24:01.227
LinearAlgebra/factorization        (4) |     4.25 |   0.65 | 15.3 |     223.63 |   749.80
LinearAlgebra/ldlt                 (4) |        started at 2023-04-12T12:24:04.039
LinearAlgebra/ldlt                 (4) |     1.40 |   0.00 |  0.0 |      70.48 |   749.80
LinearAlgebra/abstractq            (5) |     6.95 |   2.07 | 29.8 |     283.06 |  1226.16
LinearAlgebra/pinv                 (7) |    10.68 |   2.38 | 22.3 |     855.16 |  1411.39
LinearAlgebra/generic              (6) |    37.83 |   4.00 | 10.6 |    2510.75 |  1272.36
LinearAlgebra/bunchkaufman         (9) |    30.87 |   2.65 |  8.6 |    2729.90 |  1360.27
LinearAlgebra/triangular           (3) |   326.29 |  21.67 |  6.6 |   33163.40 |  2458.39
LinearAlgebra/schur                (8) |    88.67 |   2.56 |  2.9 |    1484.38 |  1007.66
LinearAlgebra/addmul               (2) |   420.11 |  13.89 |  3.3 |   37199.14 |  1532.12

Test Summary: |   Pass  Broken   Total     Time
  Overall     | 106833      17  106850  7m01.8s
    SUCCESS

Although I do see that we run slightly more tests on OpenBLAS; not sure why that is.

@staticfloat
Copy link
Member Author

As an update, macOS v13.4 beta 3 fixes the dsptrf bug; running the LinearAlgebra test suite with only Accelerate loaded (no external LAPACK) passes!

@ViralBShah
Copy link
Contributor

Wow that's quick. I suppose in that case the simplest thing is to make macOS 13.4 the min version and then remove all the LAPACK overlay stuff.

@Moblin88
Copy link

Moblin88 commented May 6, 2023

I am trying to run the ILP64 accelerate branch on MacOS 13.3.1 (on an M2 chip). I get an error when LBT tries to load lapack from the LAPACK_jll artifact. The error I get is:

Unable to autodetect interface type of "/Users/nicholasengelking/.julia/artifacts/65c65bc8413bbca96d1d988b65cdae3d9a64cedb/lib/liblapack.3.10.0.dylib"

This seems to indicate that there was an error in the autodetect_interface function in LBT that tries to determine if it's a 32 or 64 bit library.

I've tried uping LAPACK_jll and running Pkg.instantiate() but no joy. I assume this is some kind of upstream issue with artifacts, packages, or LBT, or maybe the build of the LAPACK lib?

Any help would be appreciated. I am not on the 13.4 beta with the fix for dsptrf so my understanding is that I need to use this external LAPACK lib with Accelerate BLAS

This on the head of sf/ilp64_accelerate, commit d05a891

@ViralBShah
Copy link
Contributor

@Moblin88 This works for me. I just pushed an update for LAPACK 3.11 as well, and made that the minimum. Can you try it out?

@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Patch coverage: 82.50% and project coverage change: +2.54 🎉

Comparison is base (c5186a7) 80.26% compared to head (729a176) 82.81%.

❗ Current head 729a176 differs from pull request most recent head e3753ce. Consider uploading reports for the commit e3753ce to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #58      +/-   ##
==========================================
+ Coverage   80.26%   82.81%   +2.54%     
==========================================
  Files           4        4              
  Lines         152      192      +40     
==========================================
+ Hits          122      159      +37     
- Misses         30       33       +3     
Impacted Files Coverage Δ
src/AppleAccelerate.jl 82.92% <82.50%> (-17.08%) ⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ViralBShah
Copy link
Contributor

ViralBShah commented May 12, 2023

@staticfloat I have reinstated the earlier capabilities in this package and would like to merge this PR, if it looks good to you. The DSP and Array functions do not bring additional package dependencies, so are perhaps ok to leave here for now.

We can refactor this into more packages later, but removing the code felt like we would forget about it. It works fine and passes tests, and hopefully will help others build further.

staticfloat and others added 3 commits May 11, 2023 23:48
Introduce the use of LBT to transparently use Accelerate for BLAS and LAPACK operations.

This re-architecting causes Accelerate to pass the full LinearAlgebra
test suite (thanks to the usage of an external LAPACK_jll to paper over
bugs in `dsptrf()`; hopefully no longer necessary in a future macOS
update).
Tell the user what version they're running if it fails
Use LAPACK 3.11
Run tests only on Apple
Set compat to Julia 1.9
Add Statistics and DSP for tests
Re-enable Windows tests to make sure that AppleAccelerate loads without error and is a no-op
@ViralBShah ViralBShah changed the title Significantly re-work AppleAccelerate.jl Use LBT to forward BLAS and LAPACK calls to Accelerate May 12, 2023
@ViralBShah ViralBShah merged commit e5e4631 into master May 12, 2023
@ViralBShah ViralBShah deleted the sf/ilp64_accelerate branch May 12, 2023 23:46
@Moblin88
Copy link

It's working for me now on the master branch that was just merged with LAPACK 3.11.0. It's also WAYY faster to multiply large dense matrices!

@ViralBShah
Copy link
Contributor

We will be able to remove the LAPACK dependency once macos 13.4 is out.

@amontoison
Copy link

Is it possible to do a new release of AppleAccelerate.jl?

@ViralBShah
Copy link
Contributor

My preference is to wait for macos 13.4 and remove the lapack dependency and then make a release. Would you prefer sooner?

@amontoison
Copy link

No that's fine. I just wanted to add a comment about AppleAccelerate.jl in the documentation of JuliaHSL and explained that using AppleAccelerate loads an LP64 BLAS/LAPACK like using MKL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use LBT to forward BLAS and LAPACK to Accelerate
4 participants