-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented printing for usm_ndarrays #1013
Conversation
user defined nanstr and infstr implemented
context manager
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1013/index.html |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h8c27c75_9 ran successfully. |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h8c27c75_10 ran successfully. |
0257019
to
23b3311
Compare
Array API standard conformance tests for dpctl=0.14.1dev0=py310h8c27c75_10 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgtm! Thank you @ndgrigorian. This is an awesome addition.
Majority of the time of the printing comes from memory transfer:
In [12]: x = dpt.ones(10**7 + 2, dtype='i2', device='gpu')
In [13]: %timeit p._nd_corners(x, 4)
222 µs ± 6.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [14]: %timeit x.__repr__()
932 µs ± 50.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [15]: %timeit p._nd_corners(x, 4)
232 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [16]: xx = p._nd_corners(x, 4)
In [17]: %timeit dpt.asnumpy(xx)
638 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Perhaps we can improve the code in the future by avoiding the concat call, and instead allocating the elided memory, and launching several concurrent copy kernels for each kernel, and then a copy to host dependent on those corner copying kernels.
This better be done in a separate PR though.
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h8c27c75_10 ran successfully. |
TY for this. Was much needed 🙏 |
Closes #954
Implements
__repr__
and__str__
for printingdpctl.tensor.usm_ndarray
objects and options for printing.For the sake of user convenience, dpctl stores its own print options separate from Numpy's.
For arrays large enough to be abbreviated (length > 1000 or whatever the user sets as the threshold), a copy of the nd corners is produced and as little data as possible is transferred to CPU while still leveraging
np.array2string
. This appeared to be faster than runningdpctl.tensor.asnumpy
on the array.Some benchmarks using
CPU times: user 11.3 ms, sys: 765 µs, total: 12.1 ms
Wall time: 13 ms
CPU times: user 6.18 ms, sys: 62.6 ms, total: 68.8 ms
Wall time: 95.8 ms
CPU times: user 0 ns, sys: 9.63 ms, total: 9.63 ms
Wall time: 14.1 ms
CPU times: user 9.63 ms, sys: 50.3 ms, total: 59.9 ms
Wall time: 64.1 ms
Note that these were each run on new kernels, to avoid caching.