Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] New CUDA version Part 1 #4630

Merged
merged 176 commits into from
Mar 23, 2022

Conversation

shiyu1994
Copy link
Collaborator

@shiyu1994 shiyu1994 commented Sep 26, 2021

Description

This is the first part of decomposed PR #4528, which only contains the single-GPU part of new CUDA version. Subsequent PRs will include:

  1. Handle categorical feature on CUDA. This will be done after Target and Count encodings for categorical features #3234 is merged.
  2. Using subset for CUDARowData when row subsampling and column subsampling with small fractions.
  3. Add multi-GPU and distributed support for new CUDA version.
  4. Add boosting and metric evaluation on CUDA.
  5. Add prediction on CUDA.

For this PR, let's focus on the implementation of tree learner on single-GPU with CUDA.

Implementation

The main part of implementation is in src/treelearner/cuda. 4 core classes are implemented:

  1. CUDASingleGPUTreeLearner: which inherits SerialTreeLearner. Defines the overall training logic of a single tree.
  2. CUDAHistogramConstructor, defines the histogram construction on CUDA. The CPU counterpart is the histogram construction functions in src/io/dataset.cpp.
  3. CUDADataPartition, defines how training data are mapped to different leaves during the training of a tree. The CPU counterpart is DataPartition in src/treelearner/data_partition.hpp.
  4. CUDABestSplitFinder, defines how best split threshold is found from a constructed histogram. The CPU counterpart is best threshold find methods in src/treelearner/feature_histogram.hpp.

Besides, we also have two new classes for storing data on GPU:

  1. CUDARowData: implements in src/io/cuda/cuda_row_data.cpp. Stores the data on GPU in a row-wise manner. Used in CUDAHistogramConstructor.
  2. CUDAColData: implements in src/io/cuda/cuda_col_data.cpp. Stores the data on GPU in a col-wise manner. Used in CUDADataPartition.

The basic logic of this PR:

  1. In the initialization stage, we allocate memory on GPU, and them transfer training data to GPU (including both CUDARowData and CUDAColData).
  2. Before every iteration, we calculate the gradients on the CPU, and transfer them to GPU. (See https://github.com/shiyu1994/LightGBM/blob/536f603bd9f5d4fa1170db41c5c1b6d6d22f67d0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cpp#L76-L79)
  3. CUDASingleTreeLearner will train a new tree with the gradients by calling methods of CUDAHistogramConstructor to construct the histograms, CUDABestSplitFinder to find the best thresholds, and CUDADataPartition to partition the data according to the best split, iteratively. (See https://github.com/shiyu1994/LightGBM/blob/536f603bd9f5d4fa1170db41c5c1b6d6d22f67d0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cpp#L115-L223)
  4. After finish training a tree, transfer the tree structure from GPU to CPU. (See https://github.com/shiyu1994/LightGBM/blob/536f603bd9f5d4fa1170db41c5c1b6d6d22f67d0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cpp#L225)

@StrikerRUS
Copy link
Collaborator

@StrikerRUS @jameslamb do you have more comments?

I'll continue skimming files later today.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include/LightGBM/dataset.h Show resolved Hide resolved
python-package/setup.py Show resolved Hide resolved
src/boosting/gbdt.cpp Outdated Show resolved Hide resolved
src/cuda/cuda_algorithms.cu Show resolved Hide resolved
src/cuda/cuda_utils.cpp Outdated Show resolved Hide resolved
src/treelearner/cuda/cuda_single_gpu_tree_learner.hpp Outdated Show resolved Hide resolved
src/treelearner/cuda/cuda_single_gpu_tree_learner.hpp Outdated Show resolved Hide resolved
tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Show resolved Hide resolved
tests/python_package_test/test_engine.py Show resolved Hide resolved
@shiyu1994
Copy link
Collaborator Author

@StrikerRUS Thanks for the comments. Could you please check the latest updates?

@shiyu1994
Copy link
Collaborator Author

Gently ping @StrikerRUS again. Thanks for your time.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 19, 2022

@shiyu1994 Please consider checking this my suggestion #4630 (comment) about adding a comment in CI script.

Also, could you please comment of this question (doesn't affect merging of this PR)?
#4630 (review)

Are there any plans for sharing numbers for benchmarks?
https://lightgbm.readthedocs.io/en/latest/Experiments.html
https://lightgbm.readthedocs.io/en/latest/GPU-Performance.html

Please open a new issue for supporting subset.
#4630 (comment)

@shiyu1994
Copy link
Collaborator Author

Are there any plans for sharing numbers for benchmarks? https://lightgbm.readthedocs.io/en/latest/Experiments.html https://lightgbm.readthedocs.io/en/latest/GPU-Performance.html

Sure! We will release the benchmark results when the whole new CUDA version is merged.

@shiyu1994
Copy link
Collaborator Author

@shiyu1994 Please consider checking this my suggestion #4630 (comment) about adding a comment in CI script.

Thanks for the reminder. Comment added.

@shiyu1994
Copy link
Collaborator Author

Please open a new issue for supporting subset.

#5086 is opened for recording.

@shiyu1994 shiyu1994 requested a review from StrikerRUS March 22, 2022 08:03
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so so much for extremely hard work!
I don't have any other comments based on what I've seen and understood in the code.

@StrikerRUS
Copy link
Collaborator

Do we still want to merge #4827 before this PR?
#4630 (comment)

@shiyu1994
Copy link
Collaborator Author

@StrikerRUS Given that #4827 has not passed all the tests in Azure Pipeline, I think we can merge this one first. I'll fix the problems in #4827 ASAP.

@shiyu1994
Copy link
Collaborator Author

I'm going to merge this. Thank @guolinke @StrikerRUS @jameslamb for your help!

@shiyu1994 shiyu1994 merged commit 6b56a90 into microsoft:master Mar 23, 2022
@guolinke
Copy link
Collaborator

thank you @shiyu1994 so much !

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants