-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-GPU training is not adaptable to other GPU counts or CPU #3342
Comments
If you want a script to show the problem I can, but it should be obviously a problem because in the code you mention this is not implemented. But this limits production use of xgboost. |
For training the model on one configuration, and then using it to predict on a different configuration, isn't there a functionality to export the tree on the first system to a file, and then import it on the second system? |
@pseudotensor Is it common to pickle the Python Booster object? You may be able to avoid the problem by running |
xgboost is only one class in the class object we pickle, so unless xgboost is pythonic things are difficult |
@pseudotensor By "pythonic," do you mean the ability to serialize XGBoost Booster object with |
Yes, it's common. xgboost supports this to a great extend, by ensuring C++ references are de-referenced when pickling various objects. But some things are not stored/de-referenced. And what I mean is that xgboost may just be part of an entire pipeline that gets pickled, and everything inside needs to be picklable. xgboost says it's picklable, but it's mis-leading because it doesn't really ensure the entire state is preserved and instead only some pieces are. I think most CPU pieces are, just GPU pieces are not. |
@pseudotensor I am not sure that we store any state regarding the GPU when the model is serialised. I think it should just work. |
@pseudotensor Do we even have multi-gpu prediction in xgboost today? Last time I checked, we didn't support it. |
Consolidating to #3439. This issue should be re-opened if you or others decide to actively work on implementing this feature. |
Maybe just document that pickle doesn't work for xgboost. It doesn't really save all variables |
Going to re-open this, now that we have new GPU predictor. Also, we need to make pickling more robust, since scikit-learn wrappers need pickling to function properly, see #3829. |
…el file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes dmlc#3342. TODO. Write a test.
…el file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes dmlc#3342. TODO. Write a test.
…el file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes dmlc#3342. TODO. Write a test.
…ile (#3856) * Fix #3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes #3342. TODO. Write a test. * Fix lint * Do not load GPU predictor into CPU-only XGBoost * Add a test for pickling GPU predictors * Make sample data big enough to pass multi GPU test * Update test_gpu_predictor.cu
…el file (dmlc#3856) * Fix dmlc#3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes dmlc#3342. TODO. Write a test. * Fix lint * Do not load GPU predictor into CPU-only XGBoost * Add a test for pickling GPU predictors * Make sample data big enough to pass multi GPU test * Update test_gpu_predictor.cu
It appears that predict uses the same GPU setup as train. So if I do train on 3 GPUs, I can't do predict on that model with only 1 GPU or 2 GPUs (say if I pickled the model and then from that run on 1GPU). I can force it to use CPU by changing the predictor.
I saw in the code that for predict it is not yet implemented to use n_gpus or gpu_id, but can it be done? The ultimate goal is then to be able to do take any pickled state (that was trained on any system type or GPU count) and predict on that model with any system type.
The text was updated successfully, but these errors were encountered: