[gpu] modify histogram construction (fixes #4946) #5819

tolleybot · 2023-04-04T14:50:33Z

Reason for Change:

when the X_train is more than (1800w, 1000), lgb-gpu will has a bug like this:
[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at LightGBM/src/treelearner/serial_tree_learner.app, line 686

Solution:

I was able to successfully run a large dataset with a change to src/treelearner/ocl/histogram256.cl. I

This function is defined as the following (with #ifdef constants removed for this post for clarity)

__kernel void histogram256(
__global const uchar4* feature_data_base,
__constant const uchar4* restrict feature_masks attribute((max_constant_size(65536))),
const data_size_t feature_size,
__global const data_size_t* data_indices,
const data_size_t num_data,
const score_t const_hessian,
__global const score_t* ordered_gradients, // <----- change to : __global const * ordered_gradients
__global char* restrict output_buf,
__global volatile int * sync_counters,
__global acc_type* restrict hist_buf_base
)
However, if you redefine ordered_gradients as __global const * ordered_gradients, the context will fill in the type, and the large training set runs. At first, I thought score_t was defined differently in the OpenCL code and C++, but I verified that they are both floats.
In order to validate the results. I ran two smaller dummy datasets with ordered_gradients defined explicitly and not. I compared the resulting model files and found that they were the same.

It's not yet clear to me why the change allows the program to finish I suspect there must be some type of difference. Any ideas would be welcome.

jameslamb

Thanks for taking the time to contribute to LightGBM.

I don't know much about OpenCL so can't provide a very helpful review here, but I can say with high confidence that this change shouldn't be accepted as-is.

The CI jobs for the GPU build suggest that this seriously breaks something about the data flow through the library.

For example: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=14510&view=logs&j=582743ec-83fe-58c4-937e-4197b5816801

>           assert r2_score(y, preds) > 0.86
E           assert -14777383275568.44 > 0.86

jameslamb · 2023-05-10T02:42:20Z

I'm closing this due to lack of response. We'd welcome contributions in the future if you have time to work with us.

github-actions · 2023-11-15T00:20:20Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

This change fixes issue microsoft#4946, microsoft#4946

c6847b9

tolleybot requested review from guolinke and shiyu1994 as code owners April 4, 2023 14:50

jameslamb changed the title ~~This change fixes issue #4946, https://github.com/microsoft/LightGBM/…~~ [gpu] modify histogram construction (fixes #4946) Apr 6, 2023

jameslamb requested changes Apr 6, 2023

View reviewed changes

jameslamb added in progress fix awaiting response labels Apr 6, 2023

jameslamb closed this May 10, 2023

jameslamb removed the in progress label Aug 13, 2023

github-actions bot removed the awaiting response label Nov 15, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gpu] modify histogram construction (fixes #4946) #5819

[gpu] modify histogram construction (fixes #4946) #5819

tolleybot commented Apr 4, 2023 •

edited by jameslamb

Loading

jameslamb left a comment

jameslamb commented May 10, 2023

github-actions bot commented Nov 15, 2023

[gpu] modify histogram construction (fixes #4946) #5819

[gpu] modify histogram construction (fixes #4946) #5819

Conversation

tolleybot commented Apr 4, 2023 • edited by jameslamb Loading

Reason for Change:

Solution:

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented May 10, 2023

github-actions bot commented Nov 15, 2023

tolleybot commented Apr 4, 2023 •

edited by jameslamb

Loading