[PERF] RF fit method spends a significant amount of time in transposing the input array #3832

teju85 · 2021-05-06T04:40:03Z

Describe the bug
RF fit method spends a very large amount of time in transposing the input array when: the input is a numpy array and is stored in C order. Thus, most of the optimizations we are doing in the RF C++ backend don't end up showing equivalent end-to-end speedups!

Expected behavior
Ideally, we should copy the input numpy array to cupy as-is and then perform transpose (when required) on GPUs. This'll eliminate the above mentioned perf bottleneck.

EDIT: However, a potential disadvantage with this is that we'll now require double the amount of memory (to store both the input and its transpose on GPUs)!

Environment details (please complete the following information):
For this bug env details are irrelevant as this behavior happens consistently as part of our cython wrapper code here.

Additional context

The above screenshot from nsight timeline shows that there's very little amount of time gpu is being idle, when the input dataset is already in F order.

The above image shows that almost first-half of the fit method's execution time is consumed by the transposition operation, which is currently running on CPUs, when the input dataset is in C order.

The text was updated successfully, but these errors were encountered:

teju85 · 2021-05-06T04:40:41Z

Tagging @venkywonka and @vinaydes as they're already working on RF perf optimizations.

Tagging @dantegd, JFYI.

@teju85

Closes issue #3832 Related to #3767 cc @teju85 and @venkywonka and @vinaydes who are working on RF Will be profiling the solution before flipping the PR to ready to review Quick profiling, on a 2070S laptop,average of 10 runs of a simple LinearRegression.fit (that expects data in `F` format), with a `X` matrix of 500 columns with 100000 rows shows: - Before the fix: ``` common.input_utils.input_to_cuml_array : 0.1795 s ``` - After the fix: ``` common.input_utils.input_to_cuml_array : 0.0632 s ``` Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: #3835

JohnZed · 2021-05-10T16:18:33Z

Was this closed by #3835 @dantegd and @teju85 ?

teju85 · 2021-05-10T17:14:07Z

That's correct, @JohnZed . Closing this one.

@teju85

Closes issue rapidsai#3832 Related to rapidsai#3767 cc @teju85 and @venkywonka and @vinaydes who are working on RF Will be profiling the solution before flipping the PR to ready to review Quick profiling, on a 2070S laptop,average of 10 runs of a simple LinearRegression.fit (that expects data in `F` format), with a `X` matrix of 500 columns with 100000 rows shows: - Before the fix: ``` common.input_utils.input_to_cuml_array : 0.1795 s ``` - After the fix: ``` common.input_utils.input_to_cuml_array : 0.0632 s ``` Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#3835

teju85 added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 6, 2021

dantegd mentioned this issue May 6, 2021

Ensure GPU is used for input transposing of host arrays #3835

Merged

dantegd self-assigned this May 6, 2021

dantegd added Perf Related to runtime performance of the underlying code and removed ? - Needs Triage Need team to review and classify labels May 6, 2021

teju85 closed this as completed May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] RF fit method spends a significant amount of time in transposing the input array #3832

[PERF] RF fit method spends a significant amount of time in transposing the input array #3832

teju85 commented May 6, 2021 •

edited

Loading

teju85 commented May 6, 2021

JohnZed commented May 10, 2021

teju85 commented May 10, 2021

[PERF] RF fit method spends a significant amount of time in transposing the input array #3832

[PERF] RF fit method spends a significant amount of time in transposing the input array #3832

Comments

teju85 commented May 6, 2021 • edited Loading

teju85 commented May 6, 2021

JohnZed commented May 10, 2021

teju85 commented May 10, 2021

teju85 commented May 6, 2021 •

edited

Loading