Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement asynchronous fill method using dpctl kernels #2055

Merged

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Sep 17, 2024

This PR proposes a change to dpnp_array.fill method which leverages dpctl kernels to make fill asynchronous and more efficient, avoiding repeated calls to index the array and copying scalars to the device for each element.

Shows significant performance gains on Iris Xe in WSL

Before

In [1]: import dpnp as dnp

In [2]: x_dnp = dnp.empty(10000, dtype="c8")

In [3]: %timeit x_dnp.fill(10)
1.25 s ± 47.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit x_dnp.fill(10)
1.26 s ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After

In [8]: %timeit x_dnp.fill(10); q.wait()
229 μs ± 37.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

@ndgrigorian ndgrigorian changed the title Implement efficient, asynchronous fill method using dpctl kernels Implement asynchronous fill method using dpctl kernels Sep 17, 2024
@ndgrigorian ndgrigorian force-pushed the implement-efficient-fill-method branch from 7e3662b to b8dad1e Compare September 17, 2024 18:08
@ndgrigorian
Copy link
Collaborator Author

@antonwolfy @vtavana @vlad-perevezentsev
I've added a commit to skip test_fill_with_numpy_scalar_ndarray from the CuPy tests.

dpnp.fill_diagonal does not permit NumPy arrays, so it made sense to do the same here. If it would be preferred to keep this feature, it can be implemented, but I would argue that for consistency, the two should behave the same.

@ndgrigorian ndgrigorian marked this pull request as ready for review September 17, 2024 22:03
Copy link
Contributor

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndgrigorian, thank you for implementing so great improvement.
Please find some comments below.

@ndgrigorian ndgrigorian force-pushed the implement-efficient-fill-method branch from ef96194 to 151b6a9 Compare September 19, 2024 00:17
@ndgrigorian ndgrigorian force-pushed the implement-efficient-fill-method branch 3 times, most recently from 23e9c9f to 9466f9e Compare October 14, 2024 18:51
@ndgrigorian
Copy link
Collaborator Author

@antonwolfy
I've made the suggested changes, added the new tests, and switched back to skipping the one CuPy test.

The CuPy test looks for filling the array with a NumPy ndarray, but the accept_error parameter only works where CuPy and NumPy both raise the error, and of course NumPy does not raise in that case. Since we are specifically disallowing NumPy arrays as the fill value, I think the test is best skipped.

I've also updated docstring per request.

@ndgrigorian ndgrigorian force-pushed the implement-efficient-fill-method branch from 9466f9e to 985b4fa Compare October 15, 2024 05:03
@vtavana vtavana mentioned this pull request Oct 23, 2024
6 tasks
@ndgrigorian ndgrigorian force-pushed the implement-efficient-fill-method branch from 1890180 to 865867a Compare October 24, 2024 05:09
`test_fill_non_scalar` now checks for strings and `test_fill_bool` added to verify bools are properly cast to 1
Copy link
Contributor

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ndgrigorian for significant improvement of the fill method. No more comments from me.

@ndgrigorian
Copy link
Collaborator Author

@antonwolfy
I don't have the write access so please merge when you have the chance

@antonwolfy antonwolfy merged commit 29239b6 into IntelPython:master Oct 25, 2024
44 of 46 checks passed
github-actions bot added a commit that referenced this pull request Oct 25, 2024
* Enhance `dpnp_array.fill` method

Leverages dpctl's strided fill and memset for setting contiguous memory to 0

* Fix missing disclaimer in dpnp_arraycreation.py

* Import `dpnp_array` directly

* Skip `test_fill_with_numpy_scalar_ndarray`

New fill implementation does not permit NumPy array values, consistent with fill_diagonal

* Add dependencies to zeros and full kernels in `dpnp_fill`

* Remove redundant validation of first `dpnp_fill` argument

* Improve `dpnp_fill` array/scalar path logic

* Disallow inputs to `dpnp_fill` on separate queues

* Adjust skip message for `test_fill_with_numpy_scalar_ndarray`

* Tweak error messages in `dpnp_fill`

* Add tests for new `fill` method

* Update docstring for `fill` method

* Fix pre-commit in cupy fill tests

* Change `asarray` to `astype` in `dpnp_fill`

NumPy arrays are no longer permitted and queue coercion does not occur in the `fill` method, so `astype` is sufficient

* Expand TEST_SCOPE to include `test_fill.py`

* Remove redundant check from `dpnp_fill`

* Use `_cast_fill_val` private function from `dpctl.tensor._ctors`

* Add tests per PR review by @antonwolfy

* Improve validation of `val` for `fill` method

* Add to permit NumPy bools as `dpnp_fill` scalar fill values

* Use `dpnp.bool` in `dpnp_fill` and make `isinstance` check more efficient

* Replace branching for `fill` scalar type with `_cast_fill_value`

* Add additional tests for `fill`

`test_fill_non_scalar` now checks for strings and `test_fill_bool` added to verify bools are properly cast to 1

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com> 29239b6
@ndgrigorian ndgrigorian deleted the implement-efficient-fill-method branch October 29, 2024 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants