multi-GPU training is not adaptable to other GPU counts or CPU #3342

pseudotensor · 2018-05-25T19:03:45Z

It appears that predict uses the same GPU setup as train. So if I do train on 3 GPUs, I can't do predict on that model with only 1 GPU or 2 GPUs (say if I pickled the model and then from that run on 1GPU). I can force it to use CPU by changing the predictor.

I saw in the code that for predict it is not yet implemented to use n_gpus or gpu_id, but can it be done? The ultimate goal is then to be able to do take any pickled state (that was trained on any system type or GPU count) and predict on that model with any system type.

pseudotensor · 2018-05-25T19:03:54Z

@canonizer @RAMitchell

pseudotensor · 2018-05-25T19:04:27Z

If you want a script to show the problem I can, but it should be obviously a problem because in the code you mention this is not implemented. But this limits production use of xgboost.

canonizer · 2018-05-28T16:46:40Z

For training the model on one configuration, and then using it to predict on a different configuration, isn't there a functionality to export the tree on the first system to a file, and then import it on the second system?

hcho3 · 2018-05-28T17:25:47Z

@pseudotensor Is it common to pickle the Python Booster object? You may be able to avoid the problem by running save_model() followed by load_model(), since the binary model file doesn't save n_gpu info.

pseudotensor · 2018-05-28T18:12:25Z

xgboost is only one class in the class object we pickle, so unless xgboost is pythonic things are difficult

hcho3 · 2018-05-28T18:17:13Z

unless xgboost is pythonic things are difficult

@pseudotensor By "pythonic," do you mean the ability to serialize XGBoost Booster object with pickle? Is pickling common in production setting?

pseudotensor · 2018-05-30T05:42:46Z

Yes, it's common. xgboost supports this to a great extend, by ensuring C++ references are de-referenced when pickling various objects. But some things are not stored/de-referenced.

And what I mean is that xgboost may just be part of an entire pipeline that gets pickled, and everything inside needs to be picklable.

xgboost says it's picklable, but it's mis-leading because it doesn't really ensure the entire state is preserved and instead only some pieces are. I think most CPU pieces are, just GPU pieces are not.

RAMitchell · 2018-05-31T00:33:32Z

@pseudotensor I am not sure that we store any state regarding the GPU when the model is serialised. I think it should just work.

teju85 · 2018-06-01T10:11:42Z

@pseudotensor Do we even have multi-gpu prediction in xgboost today? Last time I checked, we didn't support it.

hcho3 · 2018-07-04T23:24:29Z

Consolidating to #3439. This issue should be re-opened if you or others decide to actively work on implementing this feature.

pseudotensor · 2018-07-05T20:31:49Z

Maybe just document that pickle doesn't work for xgboost. It doesn't really save all variables

hcho3 · 2018-10-25T08:50:36Z

Going to re-open this, now that we have new GPU predictor. Also, we need to make pickling more robust, since scikit-learn wrappers need pickling to function properly, see #3829.

…el file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes dmlc#3342. TODO. Write a test.

…ile (#3856) * Fix #3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes #3342. TODO. Write a test. * Fix lint * Do not load GPU predictor into CPU-only XGBoost * Add a test for pickling GPU predictors * Make sample data big enough to pass multi GPU test * Update test_gpu_predictor.cu

…el file (dmlc#3856) * Fix dmlc#3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file This allows pickled models to retain predictor attributes, such as 'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs to use). Related: h2oai/h2o4gpu#625 Closes dmlc#3342. TODO. Write a test. * Fix lint * Do not load GPU predictor into CPU-only XGBoost * Add a test for pickling GPU predictors * Make sample data big enough to pass multi GPU test * Update test_gpu_predictor.cu

pseudotensor mentioned this issue Jun 13, 2018

Prediction running on CPU instead of GPU h2oai/h2o4gpu#625

Closed

hcho3 mentioned this issue Jul 4, 2018

Roadmap: feature requests #3439

Open

32 tasks

hcho3 closed this as completed Jul 4, 2018

hcho3 added the feature-request label Oct 23, 2018

hcho3 reopened this Oct 25, 2018

hcho3 removed the feature-request label Oct 25, 2018

hcho3 mentioned this issue Nov 2, 2018

Fix #3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file #3856

Merged

hcho3 closed this as completed in #3856 Nov 4, 2018

lock bot locked as resolved and limited conversation to collaborators Feb 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-GPU training is not adaptable to other GPU counts or CPU #3342

multi-GPU training is not adaptable to other GPU counts or CPU #3342

pseudotensor commented May 25, 2018

pseudotensor commented May 25, 2018

pseudotensor commented May 25, 2018

canonizer commented May 28, 2018

hcho3 commented May 28, 2018

pseudotensor commented May 28, 2018

hcho3 commented May 28, 2018

pseudotensor commented May 30, 2018

RAMitchell commented May 31, 2018

teju85 commented Jun 1, 2018

hcho3 commented Jul 4, 2018

pseudotensor commented Jul 5, 2018

hcho3 commented Oct 25, 2018 •

edited

Loading

multi-GPU training is not adaptable to other GPU counts or CPU #3342

multi-GPU training is not adaptable to other GPU counts or CPU #3342

Comments

pseudotensor commented May 25, 2018

pseudotensor commented May 25, 2018

pseudotensor commented May 25, 2018

canonizer commented May 28, 2018

hcho3 commented May 28, 2018

pseudotensor commented May 28, 2018

hcho3 commented May 28, 2018

pseudotensor commented May 30, 2018

RAMitchell commented May 31, 2018

teju85 commented Jun 1, 2018

hcho3 commented Jul 4, 2018

pseudotensor commented Jul 5, 2018

hcho3 commented Oct 25, 2018 • edited Loading

hcho3 commented Oct 25, 2018 •

edited

Loading