Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] RF fit method spends a significant amount of time in transposing the input array #3832

Closed
teju85 opened this issue May 6, 2021 · 3 comments
Assignees
Labels
bug Something isn't working Perf Related to runtime performance of the underlying code

Comments

@teju85
Copy link
Member

teju85 commented May 6, 2021

Describe the bug
RF fit method spends a very large amount of time in transposing the input array when: the input is a numpy array and is stored in C order. Thus, most of the optimizations we are doing in the RF C++ backend don't end up showing equivalent end-to-end speedups!

Expected behavior
Ideally, we should copy the input numpy array to cupy as-is and then perform transpose (when required) on GPUs. This'll eliminate the above mentioned perf bottleneck.

EDIT: However, a potential disadvantage with this is that we'll now require double the amount of memory (to store both the input and its transpose on GPUs)!

Environment details (please complete the following information):
For this bug env details are irrelevant as this behavior happens consistently as part of our cython wrapper code here.

Additional context
image
The above screenshot from nsight timeline shows that there's very little amount of time gpu is being idle, when the input dataset is already in F order.

image
The above image shows that almost first-half of the fit method's execution time is consumed by the transposition operation, which is currently running on CPUs, when the input dataset is in C order.

@teju85 teju85 added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 6, 2021
@teju85
Copy link
Member Author

teju85 commented May 6, 2021

Tagging @venkywonka and @vinaydes as they're already working on RF perf optimizations.

Tagging @dantegd, JFYI.

@dantegd dantegd self-assigned this May 6, 2021
@dantegd dantegd added Perf Related to runtime performance of the underlying code and removed ? - Needs Triage Need team to review and classify labels May 6, 2021
rapids-bot bot pushed a commit that referenced this issue May 7, 2021
Closes issue #3832 

Related to #3767 

cc @teju85 and @venkywonka and @vinaydes who are working on RF

Will be profiling the solution before flipping the PR to ready to review

Quick profiling, on a 2070S laptop,average of 10 runs of a simple LinearRegression.fit (that expects data in `F` format), with a `X` matrix of 500 columns with 100000 rows shows:

- Before the fix:
```
common.input_utils.input_to_cuml_array                     :   0.1795 s
```

- After the fix:
```
common.input_utils.input_to_cuml_array                     :   0.0632 s
```

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #3835
@JohnZed
Copy link
Contributor

JohnZed commented May 10, 2021

Was this closed by #3835 @dantegd and @teju85 ?

@teju85
Copy link
Member Author

teju85 commented May 10, 2021

That's correct, @JohnZed . Closing this one.

@teju85 teju85 closed this as completed May 10, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Closes issue rapidsai#3832 

Related to rapidsai#3767 

cc @teju85 and @venkywonka and @vinaydes who are working on RF

Will be profiling the solution before flipping the PR to ready to review

Quick profiling, on a 2070S laptop,average of 10 runs of a simple LinearRegression.fit (that expects data in `F` format), with a `X` matrix of 500 columns with 100000 rows shows:

- Before the fix:
```
common.input_utils.input_to_cuml_array                     :   0.1795 s
```

- After the fix:
```
common.input_utils.input_to_cuml_array                     :   0.0632 s
```

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#3835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Perf Related to runtime performance of the underlying code
Projects
None yet
Development

No branches or pull requests

3 participants