-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] RF fit method spends a significant amount of time in transposing the input array #3832
Comments
teju85
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
May 6, 2021
Tagging @venkywonka and @vinaydes as they're already working on RF perf optimizations. Tagging @dantegd, JFYI. |
dantegd
added
Perf
Related to runtime performance of the underlying code
and removed
? - Needs Triage
Need team to review and classify
labels
May 6, 2021
rapids-bot bot
pushed a commit
that referenced
this issue
May 7, 2021
Closes issue #3832 Related to #3767 cc @teju85 and @venkywonka and @vinaydes who are working on RF Will be profiling the solution before flipping the PR to ready to review Quick profiling, on a 2070S laptop,average of 10 runs of a simple LinearRegression.fit (that expects data in `F` format), with a `X` matrix of 500 columns with 100000 rows shows: - Before the fix: ``` common.input_utils.input_to_cuml_array : 0.1795 s ``` - After the fix: ``` common.input_utils.input_to_cuml_array : 0.0632 s ``` Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: #3835
That's correct, @JohnZed . Closing this one. |
vimarsh6739
pushed a commit
to vimarsh6739/cuml
that referenced
this issue
Oct 9, 2023
Closes issue rapidsai#3832 Related to rapidsai#3767 cc @teju85 and @venkywonka and @vinaydes who are working on RF Will be profiling the solution before flipping the PR to ready to review Quick profiling, on a 2070S laptop,average of 10 runs of a simple LinearRegression.fit (that expects data in `F` format), with a `X` matrix of 500 columns with 100000 rows shows: - Before the fix: ``` common.input_utils.input_to_cuml_array : 0.1795 s ``` - After the fix: ``` common.input_utils.input_to_cuml_array : 0.0632 s ``` Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#3835
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
RF fit method spends a very large amount of time in transposing the input array when: the input is a numpy array and is stored in
C
order. Thus, most of the optimizations we are doing in the RF C++ backend don't end up showing equivalent end-to-end speedups!Expected behavior
Ideally, we should copy the input numpy array to cupy as-is and then perform transpose (when required) on GPUs. This'll eliminate the above mentioned perf bottleneck.
EDIT: However, a potential disadvantage with this is that we'll now require double the amount of memory (to store both the input and its transpose on GPUs)!
Environment details (please complete the following information):
For this bug env details are irrelevant as this behavior happens consistently as part of our cython wrapper code here.
Additional context
The above screenshot from nsight timeline shows that there's very little amount of time gpu is being idle, when the input dataset is already in
F
order.The above image shows that almost first-half of the
fit
method's execution time is consumed by the transposition operation, which is currently running on CPUs, when the input dataset is inC
order.The text was updated successfully, but these errors were encountered: