Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-GPU training is not adaptable to other GPU counts or CPU #3342

Closed
pseudotensor opened this issue May 25, 2018 · 12 comments
Closed

multi-GPU training is not adaptable to other GPU counts or CPU #3342

pseudotensor opened this issue May 25, 2018 · 12 comments

Comments

@pseudotensor
Copy link
Contributor

It appears that predict uses the same GPU setup as train. So if I do train on 3 GPUs, I can't do predict on that model with only 1 GPU or 2 GPUs (say if I pickled the model and then from that run on 1GPU). I can force it to use CPU by changing the predictor.

I saw in the code that for predict it is not yet implemented to use n_gpus or gpu_id, but can it be done? The ultimate goal is then to be able to do take any pickled state (that was trained on any system type or GPU count) and predict on that model with any system type.

@pseudotensor
Copy link
Contributor Author

@canonizer @RAMitchell

@pseudotensor
Copy link
Contributor Author

If you want a script to show the problem I can, but it should be obviously a problem because in the code you mention this is not implemented. But this limits production use of xgboost.

@canonizer
Copy link
Contributor

For training the model on one configuration, and then using it to predict on a different configuration, isn't there a functionality to export the tree on the first system to a file, and then import it on the second system?

@hcho3
Copy link
Collaborator

hcho3 commented May 28, 2018

@pseudotensor Is it common to pickle the Python Booster object? You may be able to avoid the problem by running save_model() followed by load_model(), since the binary model file doesn't save n_gpu info.

@pseudotensor
Copy link
Contributor Author

xgboost is only one class in the class object we pickle, so unless xgboost is pythonic things are difficult

@hcho3
Copy link
Collaborator

hcho3 commented May 28, 2018

unless xgboost is pythonic things are difficult

@pseudotensor By "pythonic," do you mean the ability to serialize XGBoost Booster object with pickle? Is pickling common in production setting?

@pseudotensor
Copy link
Contributor Author

Yes, it's common. xgboost supports this to a great extend, by ensuring C++ references are de-referenced when pickling various objects. But some things are not stored/de-referenced.

And what I mean is that xgboost may just be part of an entire pipeline that gets pickled, and everything inside needs to be picklable.

xgboost says it's picklable, but it's mis-leading because it doesn't really ensure the entire state is preserved and instead only some pieces are. I think most CPU pieces are, just GPU pieces are not.

@RAMitchell
Copy link
Member

@pseudotensor I am not sure that we store any state regarding the GPU when the model is serialised. I think it should just work.

@teju85
Copy link
Contributor

teju85 commented Jun 1, 2018

@pseudotensor Do we even have multi-gpu prediction in xgboost today? Last time I checked, we didn't support it.

@hcho3
Copy link
Collaborator

hcho3 commented Jul 4, 2018

Consolidating to #3439. This issue should be re-opened if you or others decide to actively work on implementing this feature.

@hcho3 hcho3 closed this as completed Jul 4, 2018
@pseudotensor
Copy link
Contributor Author

Maybe just document that pickle doesn't work for xgboost. It doesn't really save all variables

@hcho3 hcho3 reopened this Oct 25, 2018
@hcho3
Copy link
Collaborator

hcho3 commented Oct 25, 2018

Going to re-open this, now that we have new GPU predictor. Also, we need to make pickling more robust, since scikit-learn wrappers need pickling to function properly, see #3829.

hcho3 added a commit to hcho3/xgboost that referenced this issue Nov 2, 2018
…el file

This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625

Closes dmlc#3342.

TODO. Write a test.
hcho3 added a commit to hcho3/xgboost that referenced this issue Nov 2, 2018
…el file

This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625

Closes dmlc#3342.

TODO. Write a test.
hcho3 added a commit to hcho3/xgboost that referenced this issue Nov 2, 2018
…el file

This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625

Closes dmlc#3342.

TODO. Write a test.
hcho3 added a commit that referenced this issue Nov 4, 2018
…ile (#3856)

* Fix #3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file

This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625

Closes #3342.

TODO. Write a test.

* Fix lint

* Do not load GPU predictor into CPU-only XGBoost

* Add a test for pickling GPU predictors

* Make sample data big enough to pass multi GPU test

* Update test_gpu_predictor.cu
alois-bissuel pushed a commit to criteo-forks/xgboost that referenced this issue Dec 4, 2018
…el file (dmlc#3856)

* Fix dmlc#3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file

This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625

Closes dmlc#3342.

TODO. Write a test.

* Fix lint

* Do not load GPU predictor into CPU-only XGBoost

* Add a test for pickling GPU predictors

* Make sample data big enough to pass multi GPU test

* Update test_gpu_predictor.cu
@lock lock bot locked as resolved and limited conversation to collaborators Feb 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants