Bugfix for 2767 - fix rf path trying to sample 0 columns #2788

drobison00 · 2020-09-02T19:08:55Z

In some situations, the number of sampled columns, multiplied by max features would result in a number that rounded to 0, which in turn caused a memory block of size zero to be allocated, and CudaMemsetAsync to throw an invalid argument exception.

This sets a floor of 1 for ncols_sampled, which resolves the issue.

Closes #2767

GPUtester · 2020-09-02T19:09:22Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

beckernick

Glad to see this is so clean!

Just curious why this only came up with RF regressor, rather than both the RF classifier and regressor. Does the classifier segment have a guard against this? Seems like this is used in both downstream in the //regression and //classifcation sections.

drobison00 · 2020-09-02T22:42:35Z

@beckernick
I tested a bit with the equivalent classification paths. They are affected, in that 'histcount', defined in memory.cuh:218, will be set to zero, and all the subsequent (h/d)_hist_xxx buffer allocations become zero, and result in 0x00 data pointers in the tree data structure. I'm not familiar enough with the code paths, but It seems unlikely that everything was working as expected with the classifier, when ncols * max_features < 1, on that path.

beckernick · 2020-09-03T13:23:24Z

@beckernick
I tested a bit with the equivalent classification paths. They are affected, in that 'histcount', defined in memory.cuh:218, will be set to zero, and all the subsequent (h/d)_hist_xxx buffer allocations become zero, and result in 0x00 data pointers in the tree data structure. I'm not familiar enough with the code paths, but It seems unlikely that everything was working as expected with the classifier, when ncols * max_features < 1, on that path.

Got it, makes sense. A silent failure is just as devious.

Looks like a FIL test is now failing, but it seems like that would be unrelated to this code path. It also seems to be failing in another PR as well, further suggesting that (#2789)

FAILED cuml/test/test_fil.py::test_lightgbm - assert False

cc @dantegd is this possibly an expected failure?

dantegd · 2020-09-03T15:19:10Z

rerun tests

dantegd · 2020-09-03T15:20:10Z

@beckernick PR #2787 disabled that test temporarily because there seems to be a small issue with that test and lightgbm 3.0

dantegd

Change lgtm

beckernick · 2020-09-03T15:21:09Z

Ah, perfect. thanks for the quick explanation 👍

Bugfix for 2767 - fix rf path trying to sample 0 columns

c3747f0

drobison00 added 3 - Ready for Review Ready for review by team CUDA / C++ CUDA issue labels Sep 2, 2020

drobison00 requested a review from beckernick September 2, 2020 19:08

drobison00 requested a review from a team as a code owner September 2, 2020 19:08

Update changelog

8f570ee

drobison00 requested a review from JohnZed September 2, 2020 19:14

beckernick reviewed Sep 2, 2020

View reviewed changes

dantegd approved these changes Sep 3, 2020

View reviewed changes

beckernick merged commit 212813d into rapidsai:branch-0.16 Sep 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix for 2767 - fix rf path trying to sample 0 columns #2788

Bugfix for 2767 - fix rf path trying to sample 0 columns #2788

drobison00 commented Sep 2, 2020

GPUtester commented Sep 2, 2020

beckernick left a comment •

edited

Loading

drobison00 commented Sep 2, 2020

beckernick commented Sep 3, 2020

dantegd commented Sep 3, 2020

dantegd commented Sep 3, 2020

dantegd left a comment

beckernick commented Sep 3, 2020

Bugfix for 2767 - fix rf path trying to sample 0 columns #2788

Bugfix for 2767 - fix rf path trying to sample 0 columns #2788

Conversation

drobison00 commented Sep 2, 2020

GPUtester commented Sep 2, 2020

beckernick left a comment • edited Loading

Choose a reason for hiding this comment

drobison00 commented Sep 2, 2020

beckernick commented Sep 3, 2020

dantegd commented Sep 3, 2020

dantegd commented Sep 3, 2020

dantegd left a comment

Choose a reason for hiding this comment

beckernick commented Sep 3, 2020

beckernick left a comment •

edited

Loading