-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement of _launch (code block 2: packing CArray) #280
Comments
I'm trying to pin down the bottleneck lines by placing probes ( |
I created a new branch I ran
|
Executed 3 more times, and these five execution times do not differ much (the differences are at most about +/-10%).
|
|
I've tried this issue, I notice that reducing the overhead of this code block is difficult. As @ybsh reported, the elapsed time of each line is almost the same (11 ~ 34 ms) so there is no hotspot. I tried some optimizations but couldn't work:
I suggest changing |
How about in the case of CuPy? CuPy also stores ndarray to CArray. |
A subproblem of #153 .
This issue focuses on improvement of
this code block mentioned here.
The text was updated successfully, but these errors were encountered: