-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU memory is not released after training with {'predictor':'cpu_predictor'} #4018
Comments
@vss888 This requires us to design and implement a global GPU memory management system inside XGBoost, which is never a trivial thing to do, especially when memory can be distributed across multiple GPUs. Alternatively we add a The methods used in deep learning framework which has a distributable computation graph does not apply on tree boosting algorithms. More thoughts are needed. |
@vss888 I tried to see how bad the situation is in master branch, with a dataset of shape (200000, 3000), taking up 2GB GPU memory during training, but with only 159 MB left after deleting trained model (in Python |
@trivialfis Thank you very much for giving the feature a thought! My hope was that an xgboost model object keeps track of the GPU memory it allocated, and so releasing it would be as easy as calling Looking at my larger data set, the shape is (101750935, 6) (4 features, 1 target, 1 weight) and it uses about 4GB of GPU memory, which means that I can train <=4 of such models in parallel (Tesla P100). Smaller data sets are about 10 times smaller, and then I can train up to 40 such models in parallel. GPU is used pretty lightly from the computing point of view (the highest utilization I have seen with multiple models trained in parallel was 27%, and most of the time it is 0% or only some single digit percentage as reported by |
@trivialfis Plus, I can not delete a model until all the predictions are finished, and so the GPU memory remains allocated. One possible solution would be to create a |
I understand your use cases now. :)
Yap, I thought about that but gave up the idea. Here is the problem, inside XGBoost, every component allocates memory as needed, objectives, metrics, updaters ... and the memory is allocated on different GPU with different threads. If you let another class (in the sense of OOP) to delete the memory, ownership of these memory becomes a problem, especially with multi-threading. If there's a bug caused by access after free, all we can get is an issue on GitHub with a segfault message. Such bugs are very hard to prevent, and even harder to debug with many issues not providing reproducible script. The simplest way to think about it is how to write a test that shows the code is correct. I can't think of one. :( I will put more thoughts on this as I go along with refactoring current GPU code base. But the simplest workaround for you might be save the model first and delete it. |
has there been any progress or new options? |
Yes this is a teething problem |
Lets continue this discussion in #4668. I think this needs to be a priority for us. |
For those who look for a quick workaround till you fix it properly check my solution here |
It seems reasonable to claim that there is no justifiable reason not to release GPU memory after training if xgboost is used with
predictor='cpu_predictor'
(please, correct me if I am wrong), and so I was wondering if you could, please, put it on the list of features to be implemented.It would make the process of hyper-parameter optimization much more efficient since the bottle neck (at least in my use case) is the available GPU memory, and so if GPU memory is released after training, many more models could be trained/tested in parallel using the same GPU.
This is related to issues 3045 and 3083.
The text was updated successfully, but these errors were encountered: