Update `dpnp.linalg.svd()` to run on CUDA #2212

vlad-perevezentsev · 2024-12-04T15:27:16Z

This PR suggests updating dpnp.linagl.svd() implementation to support running on CUDA devices.
Since cuSolver gesvd only supports m>=n (Remark 1) the previous implementation crashed with Segmentation fault (core dumped)

$ ONEAPI_DEVICE_SELECTOR=cuda:gpu pytest -v dpnp/tests/test_linalg.py::TestSvd::test_svd

Python error: Segmentation fault

This suggests adding checks for m>=n otherwise transpose the input array.

Passing the transposed array to oneapi::mkl::lapack::gesvd increases the performance of dpnp.linalg.svd() due to the reducing a matrix with m >= n to bidiagonal form (inside lapack::gesvd) is more efficient


# 2D array 

$ a_shape = (1024,2048)
$ na = generate_random_numpy_array(a_shape, dtype='f4', seed_value=81)

$ a_dp = dpnp.array(na, device='gpu')

# GPU 
$ %timeit res_dp = dpnp.linalg.svd(a_dp);q.wait()
# 1.07 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

$ %timeit res_dp = dpnp.linalg.svd(a_dp,new=True);q.wait()
# 881 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

$ a_dp = dpnp.array(na, device='cpu')

# CPU 
$ %timeit res_dp = dpnp.linalg.svd(a_dp);q.wait()
# 1.1 s ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

$ %timeit res_dp = dpnp.linalg.svd(a_dp,new=True);q.wait()
# 897 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 3D array

$ a_shape = (16,256,512)
$ na = generate_random_numpy_array(a_shape, dtype='f4', seed_value=81)


# GPU
$ %timeit res_dp = dpnp.linalg.svd(a_dp);q.wait()
# 979 ms ± 9.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

$ %timeit res_dp = dpnp.linalg.svd(a_dp,new=True);q.wait()
# 895 ms ± 6.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# CPU

$ %timeit res_dp = dpnp.linalg.svd(a_dp);q.wait()
# 753 ms ± 39.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

$ %timeit res_dp = dpnp.linalg.svd(a_dp,new=True);q.wait()
# 679 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you filing the PR as a draft?

github-actions · 2024-12-04T15:46:46Z

View rendered docs @ https://intelpython.github.io/dpnp/index.html

dpnp/tests/third_party/cupy/linalg_tests/test_decomposition.py

dpnp/linalg/dpnp_utils_linalg.py

antonwolfy

Thank you @vlad-perevezentsev, LGTM

This PR suggests updating `dpnp.linagl.svd()` implementation to support running on CUDA devices. Since cuSolver gesvd only supports m>=n the previous implementation crashed with `Segmentation fault (core dumped)` This suggests adding checks for `m>=n` otherwise transpose the input array. Passing the transposed array to `oneapi::mkl::lapack::gesvd` increases the performance of `dpnp.linalg.svd()` due to the reducing a matrix with `m >= n` to bidiagonal form (inside `lapack::gesvd`) is more efficient 4875e59

vlad-perevezentsev added 2 commits December 4, 2024 06:40

Update dpnp.linalg.svd to support CUDA and improve performance

3ad8ba5

Unskip TestSVD for windows

8e6e0cd

vlad-perevezentsev self-assigned this Dec 4, 2024

vlad-perevezentsev requested review from AlexanderKalistratov, antonwolfy and vtavana as code owners December 4, 2024 15:27

antonwolfy reviewed Dec 4, 2024

View reviewed changes

dpnp/tests/third_party/cupy/linalg_tests/test_decomposition.py Show resolved Hide resolved

dpnp/linalg/dpnp_utils_linalg.py Outdated Show resolved Hide resolved

vlad-perevezentsev added 2 commits December 5, 2024 02:05

AApply comments

7a8c11c

Merge master into update_svd_cuda

11e28de

antonwolfy approved these changes Dec 5, 2024

View reviewed changes

Merge master into update_svd_cuda

18f0cb1

vlad-perevezentsev merged commit 4875e59 into master Dec 6, 2024
47 of 50 checks passed

vlad-perevezentsev deleted the update_svd_cuda branch December 6, 2024 14:43

vlad-perevezentsev added a commit that referenced this pull request Dec 11, 2024

Unskip linalg tests due to fix in gh-2212

fa7ab49

vlad-perevezentsev added a commit that referenced this pull request Dec 11, 2024

Unskip linalg tests due to fix in gh-2212

70c9b0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update `dpnp.linalg.svd()` to run on CUDA #2212

Update `dpnp.linalg.svd()` to run on CUDA #2212

Uh oh!

vlad-perevezentsev commented Dec 4, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

antonwolfy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update dpnp.linalg.svd() to run on CUDA #2212

Update dpnp.linalg.svd() to run on CUDA #2212

Uh oh!

Conversation

vlad-perevezentsev commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonwolfy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update `dpnp.linalg.svd()` to run on CUDA #2212

Update `dpnp.linalg.svd()` to run on CUDA #2212

vlad-perevezentsev commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading