Implement asynchronous `fill` method using `dpctl` kernels #2055

ndgrigorian · 2024-09-17T01:16:07Z

This PR proposes a change to dpnp_array.fill method which leverages dpctl kernels to make fill asynchronous and more efficient, avoiding repeated calls to index the array and copying scalars to the device for each element.

Shows significant performance gains on Iris Xe in WSL

Before

In [1]: import dpnp as dnp

In [2]: x_dnp = dnp.empty(10000, dtype="c8")

In [3]: %timeit x_dnp.fill(10)
1.25 s ± 47.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit x_dnp.fill(10)
1.26 s ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After

In [8]: %timeit x_dnp.fill(10); q.wait()
229 μs ± 37.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you filing the PR as a draft?

ndgrigorian · 2024-09-17T20:42:18Z

@antonwolfy @vtavana @vlad-perevezentsev
I've added a commit to skip test_fill_with_numpy_scalar_ndarray from the CuPy tests.

dpnp.fill_diagonal does not permit NumPy arrays, so it made sense to do the same here. If it would be preferred to keep this feature, it can be implemented, but I would argue that for consistency, the two should behave the same.

antonwolfy

@ndgrigorian, thank you for implementing so great improvement.
Please find some comments below.

dpnp/dpnp_algo/dpnp_fill.py

tests/third_party/cupy/core_tests/test_ndarray_copy_and_view.py

dpnp/dpnp_array.py

dpnp/dpnp_algo/dpnp_fill.py

ndgrigorian · 2024-10-14T20:32:34Z

@antonwolfy
I've made the suggested changes, added the new tests, and switched back to skipping the one CuPy test.

The CuPy test looks for filling the array with a NumPy ndarray, but the accept_error parameter only works where CuPy and NumPy both raise the error, and of course NumPy does not raise in that case. Since we are specifically disallowing NumPy arrays as the fill value, I think the test is best skipped.

I've also updated docstring per request.

Leverages dpctl's strided fill and memset for setting contiguous memory to 0

New fill implementation does not permit NumPy array values, consistent with fill_diagonal

dpnp/dpnp_algo/dpnp_fill.py

tests/test_fill.py

dpnp/dpnp_algo/dpnp_fill.py

NumPy arrays are no longer permitted and queue coercion does not occur in the `fill` method, so `astype` is sufficient

dpnp/dpnp_algo/dpnp_fill.py

tests/test_fill.py

…ient

`test_fill_non_scalar` now checks for strings and `test_fill_bool` added to verify bools are properly cast to 1

antonwolfy

Thank you @ndgrigorian for significant improvement of the fill method. No more comments from me.

ndgrigorian · 2024-10-25T05:58:44Z

@antonwolfy
I don't have the write access so please merge when you have the chance

@antonwolfy

* Enhance `dpnp_array.fill` method Leverages dpctl's strided fill and memset for setting contiguous memory to 0 * Fix missing disclaimer in dpnp_arraycreation.py * Import `dpnp_array` directly * Skip `test_fill_with_numpy_scalar_ndarray` New fill implementation does not permit NumPy array values, consistent with fill_diagonal * Add dependencies to zeros and full kernels in `dpnp_fill` * Remove redundant validation of first `dpnp_fill` argument * Improve `dpnp_fill` array/scalar path logic * Disallow inputs to `dpnp_fill` on separate queues * Adjust skip message for `test_fill_with_numpy_scalar_ndarray` * Tweak error messages in `dpnp_fill` * Add tests for new `fill` method * Update docstring for `fill` method * Fix pre-commit in cupy fill tests * Change `asarray` to `astype` in `dpnp_fill` NumPy arrays are no longer permitted and queue coercion does not occur in the `fill` method, so `astype` is sufficient * Expand TEST_SCOPE to include `test_fill.py` * Remove redundant check from `dpnp_fill` * Use `_cast_fill_val` private function from `dpctl.tensor._ctors` * Add tests per PR review by @antonwolfy * Improve validation of `val` for `fill` method * Add to permit NumPy bools as `dpnp_fill` scalar fill values * Use `dpnp.bool` in `dpnp_fill` and make `isinstance` check more efficient * Replace branching for `fill` scalar type with `_cast_fill_value` * Add additional tests for `fill` `test_fill_non_scalar` now checks for strings and `test_fill_bool` added to verify bools are properly cast to 1 --------- Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com> 29239b6

ndgrigorian changed the title ~~Implement efficient, asynchronous fill method using dpctl kernels~~ Implement asynchronous fill method using dpctl kernels Sep 17, 2024

ndgrigorian force-pushed the implement-efficient-fill-method branch from 7e3662b to b8dad1e Compare September 17, 2024 18:08

ndgrigorian marked this pull request as ready for review September 17, 2024 22:03

ndgrigorian requested review from antonwolfy, npolina4, vlad-perevezentsev and vtavana as code owners September 17, 2024 22:03

antonwolfy reviewed Sep 18, 2024

View reviewed changes

ndgrigorian force-pushed the implement-efficient-fill-method branch from ef96194 to 151b6a9 Compare September 19, 2024 00:17

ndgrigorian force-pushed the implement-efficient-fill-method branch 3 times, most recently from 23e9c9f to 9466f9e Compare October 14, 2024 18:51

ndgrigorian added 13 commits October 14, 2024 22:03

Enhance dpnp_array.fill method

f955945

Leverages dpctl's strided fill and memset for setting contiguous memory to 0

Fix missing disclaimer in dpnp_arraycreation.py

13edd84

Import dpnp_array directly

4976a24

Skip test_fill_with_numpy_scalar_ndarray

978a081

New fill implementation does not permit NumPy array values, consistent with fill_diagonal

Add dependencies to zeros and full kernels in dpnp_fill

2686af9

Remove redundant validation of first dpnp_fill argument

2a826b2

Improve dpnp_fill array/scalar path logic

6e2193b

Disallow inputs to dpnp_fill on separate queues

08b6c26

Adjust skip message for test_fill_with_numpy_scalar_ndarray

c64dc0b

Tweak error messages in dpnp_fill

9691cf0

Add tests for new fill method

957b93a

Update docstring for fill method

87e6560

Fix pre-commit in cupy fill tests

985b4fa

ndgrigorian force-pushed the implement-efficient-fill-method branch from 9466f9e to 985b4fa Compare October 15, 2024 05:03

ndgrigorian requested a review from antonwolfy October 15, 2024 23:11

antonwolfy reviewed Oct 22, 2024

View reviewed changes

ndgrigorian added 5 commits October 22, 2024 10:56

Change asarray to astype in dpnp_fill

d8a3e65

NumPy arrays are no longer permitted and queue coercion does not occur in the `fill` method, so `astype` is sufficient

Expand TEST_SCOPE to include test_fill.py

70618e1

Remove redundant check from dpnp_fill

478397f

Use _cast_fill_val private function from dpctl.tensor._ctors

f782d1c

Add tests per PR review by @antonwolfy

e29685e

vtavana mentioned this pull request Oct 23, 2024

implement dpnp.pad #2093

Merged

6 tasks

Improve validation of val for fill method

bd30f85

ndgrigorian requested a review from antonwolfy October 23, 2024 19:08

Add to permit NumPy bools as dpnp_fill scalar fill values

865867a

ndgrigorian force-pushed the implement-efficient-fill-method branch from 1890180 to 865867a Compare October 24, 2024 05:09

antonwolfy reviewed Oct 24, 2024

View reviewed changes

dpnp/dpnp_algo/dpnp_fill.py Outdated Show resolved Hide resolved

dpnp/dpnp_algo/dpnp_fill.py Outdated Show resolved Hide resolved

tests/test_fill.py Outdated Show resolved Hide resolved

tests/test_fill.py Show resolved Hide resolved

tests/test_fill.py Show resolved Hide resolved

ndgrigorian added 3 commits October 24, 2024 08:32

Use dpnp.bool in dpnp_fill and make isinstance check more effic…

4ba7186

…ient

Replace branching for fill scalar type with _cast_fill_value

48c8051

Add additional tests for fill

8adc06a

`test_fill_non_scalar` now checks for strings and `test_fill_bool` added to verify bools are properly cast to 1

ndgrigorian requested a review from antonwolfy October 24, 2024 15:40

Merge branch 'master' into implement-efficient-fill-method

c91ab47

antonwolfy approved these changes Oct 24, 2024

View reviewed changes

ndgrigorian added 2 commits October 24, 2024 15:02

Merge branch 'master' into implement-efficient-fill-method

c4e0481

Merge branch 'master' into implement-efficient-fill-method

387b542

antonwolfy merged commit 29239b6 into IntelPython:master Oct 25, 2024
44 of 46 checks passed

ndgrigorian deleted the implement-efficient-fill-method branch October 29, 2024 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement asynchronous `fill` method using `dpctl` kernels #2055

Implement asynchronous `fill` method using `dpctl` kernels #2055

Uh oh!

ndgrigorian commented Sep 17, 2024 •

edited

Loading

Uh oh!

ndgrigorian commented Sep 17, 2024

Uh oh!

antonwolfy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndgrigorian commented Oct 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonwolfy left a comment

Uh oh!

ndgrigorian commented Oct 25, 2024

Uh oh!

Uh oh!

Uh oh!

Implement asynchronous fill method using dpctl kernels #2055

Implement asynchronous fill method using dpctl kernels #2055

Uh oh!

Conversation

ndgrigorian commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ndgrigorian commented Sep 17, 2024

Uh oh!

antonwolfy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndgrigorian commented Oct 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonwolfy left a comment

Choose a reason for hiding this comment

Uh oh!

ndgrigorian commented Oct 25, 2024

Uh oh!

Uh oh!

Uh oh!

Implement asynchronous `fill` method using `dpctl` kernels #2055

Implement asynchronous `fill` method using `dpctl` kernels #2055

ndgrigorian commented Sep 17, 2024 •

edited

Loading