-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The more training rounds, the lower the GPU usage rate #6605
Comments
Excellent report, thanks very much! Could you try installing That could be one reason for lower GPU utilization... the split-finding part of training can benefit from parallelization, but there's a sync-up after each search where the model has to be updated. I wonder if maybe in the later iterations, LightGBM is training much shallower trees (and therefore spending proportionally more time in those non-parallelized code paths).
LightGBM will stop growing a particular tree under a few conditions:
Unrelated, some notes on those parameters: # this is the default, you can omit this
"tree_learner": "serial"
# these are only relevant for the CLI, omit them when using the Python package
"task": "train"
"is_training_metric": "false" |
Glad to receive your reply! I ran 5,000 rounds using cuda, and this is part of the selection.
When using device=cuda training, as the number of training iterations increases, GPU utilization decreases, and the time spent on each training iterations increases. However, device=CPU is not like this.
Thank you very much for your suggestion! Looking forward to your reply! Thanks! |
Sorry, my request was unclear. I'm not looking for a random sample of that dataframe. Could you use that output to see if there is a difference in the number of leaves in each tree? A finding like "the trees in later iterations have fewer leaves" would be very informative here. I'm looking for output similar to this:
|
Hi, sorry for my later reply. I used "cat model.txt | grep -A 1 Tree=" to check the save_model. I got Tree=4975
num_leaves=255
--
Tree=4976
num_leaves=255
--
Tree=4977
num_leaves=255
--
Tree=4978
num_leaves=255
--
Tree=4979
num_leaves=255
--
Tree=4980
num_leaves=255
--
Tree=4981
num_leaves=255
--
Tree=4982
num_leaves=255
--
Tree=4983
num_leaves=255
--
Tree=4984
num_leaves=255
--
Tree=4985
num_leaves=255
--
Tree=4986
num_leaves=255
--
Tree=4987
num_leaves=255
--
Tree=4988
num_leaves=255
--
Tree=4989
num_leaves=255
--
Tree=4990
num_leaves=255
--
Tree=4991
num_leaves=255
--
Tree=4992
num_leaves=255
--
Tree=4993
num_leaves=255
--
Tree=4994
num_leaves=255
--
Tree=4995
num_leaves=255
--
Tree=4996
num_leaves=255
--
Tree=4997
num_leaves=255
--
Tree=4998
num_leaves=255
--
Tree=4999
num_leaves=255 In fact, every tree has num_leaves=255. |
hmmmm ok thank you for that! Sorry, but I'm out of ideas. I'm not that familiar with the performance characteristics of the CUDA build here. I hope @shiyu1994 will be able to help. |
I wanted test speed in python.
I tried to replicate gpu(L40) and cpu(28cores) experiment with higgs. The following are the experimental results.
num_iterations(500): cuda(28s) version was slower than cpu(71s).
num_iterations(5000): cuda(570s) version was slower than cpu(403s).
Within ten minutes, Volatile GPU-Util gradually decreased from 80% to within 10%
dataset and parameter settings from: https://github.com/microsoft/LightGBM/blob/master/docs/GPU-Tutorial.rst
Dataset Preparation from https://github.com/guolinke/boosting_tree_benchmarks/blob/master/data/higgs2libsvm.py
no other significant processes
Code:
cpu test
gpu test
The text was updated successfully, but these errors were encountered: