only copy the model once when predicting multiple batches #4457

rongou · 2019-05-10T22:37:28Z

canonizer · 2019-05-10T22:52:49Z

src/predictor/gpu_predictor.cu

+        dh::safe_cuda(cudaMemcpyAsync(dh::Raw(tree_group_), model.tree_info.data(),
+                                      sizeof(int) * model.tree_info.size(),
+                                      cudaMemcpyHostToDevice));
+      }


This code would look better as a separate method. In my opinion, this looks more logical, and reduces the number of function parameters.

sriramch · 2019-05-10T23:29:54Z

src/predictor/gpu_predictor.cu

@@ -361,10 +364,14 @@ class GPUPredictor : public xgboost::Predictor {
      DeviceOffsets(batch.offset, batch.data.Size(), &device_offsets);
      batch.data.Reshard(GPUDistribution::Explicit(devices_, device_offsets));

-      // TODO(rongou): only copy the model once for all the batches.
+      if (batch_offset == 0) {


can this be hoisted out of the for loop?

canonizer · 2019-05-10T23:28:45Z

src/predictor/gpu_predictor.cu

+        dh::ExecuteIndexShards(&shards_, [&](int idx, DeviceShard& shard) {
+          shard.InitModel(model, h_tree_segments, h_nodes);
+        });
+      }


Consider moving it outside of the loop. You won't need the conditional then.

canonizer · 2019-05-10T23:29:36Z

src/predictor/gpu_predictor.cu

+    void PredictInternal
+    (const SparsePage& batch, const MetaInfo& info,
+     HostDeviceVector<bst_float>* predictions,
+     size_t tree_begin, size_t tree_end, int n_classes) {


Why do you need n_classes?

It's needed by the prediction kernel.

As it belongs to the model, could you move it to InitModel()?

canonizer · 2019-05-10T23:32:59Z

src/predictor/gpu_predictor.cu

@@ -361,10 +364,14 @@ class GPUPredictor : public xgboost::Predictor {
      DeviceOffsets(batch.offset, batch.data.Size(), &device_offsets);
      batch.data.Reshard(GPUDistribution::Explicit(devices_, device_offsets));

-      // TODO(rongou): only copy the model once for all the batches.
+      if (batch_offset == 0) {
+        dh::ExecuteIndexShards(&shards_, [&](int idx, DeviceShard& shard) {


Given that the code processing the tree nodes above (lines 335-350) has to do with model initialization, consider moving it (together with calls to DeviceShard::InitModel()) to a separate method of GPUPredictor.

canonizer · 2019-05-11T00:54:03Z

src/predictor/gpu_predictor.cu

+    void PredictInternal
+    (const SparsePage& batch, const MetaInfo& info,
+     HostDeviceVector<bst_float>* predictions,
+     size_t tree_begin, size_t tree_end, int n_classes) {


As it belongs to the model, could you move it to InitModel()?

canonizer · 2019-05-11T00:58:58Z

src/predictor/gpu_predictor.cu

-    if (tree_end - tree_begin == 0) { return; }
-    monitor_.StartCuda("DevicePredictInternal");
-
+  void InitModel(const gbm::GBTreeModel &model, size_t tree_begin, size_t tree_end) {


Consider placing & consistently, i.e. const gbm::GBTreeModel&, auto& or const gbm::GBTreeModel &, auto &. In two of the InitModel() methods, & is placed differently.

canonizer · 2019-05-11T01:05:15Z

src/predictor/gpu_predictor.cu

-        shard.PredictInternal(batch, dmat->Info(), out_preds, model,
-                              h_tree_segments, h_nodes, tree_begin, tree_end);
+        shard.PredictInternal(batch, dmat->Info(), out_preds, tree_begin, tree_end,
+                              model.param.num_output_group);


I think all three of these parameters can be stored in the shard after InitModel().

However, I'll leave this up to you.

rongou · 2019-05-14T19:52:49Z

@RAMitchell this is ready to merge. Thanks!

hcho3 · 2019-05-14T21:56:58Z

@RAMitchell Do we want this in 0.90?

hcho3 · 2019-05-14T21:59:14Z

@rongou @canonizer @sriramch Can you do me a favor and explain what this PR does? Is this a follow-up to #4284 (external memory with single GPU) and #4438 (external memory with multiple GPUs)?

rongou · 2019-05-14T22:53:01Z

@hcho3 yes it's an optimization and some refactoring. In external memory mode when we are running prediction on multiple batches, we should only copy the model to GPU once instead of every batch.

RAMitchell · 2019-05-14T23:04:13Z

@hcho3 this is a fairly low impact change, up to you whether you want to include it. I will merge so as not to have prs sitting around.

canonizer suggested changes May 10, 2019

View reviewed changes

sriramch reviewed May 10, 2019

View reviewed changes

canonizer suggested changes May 10, 2019

View reviewed changes

canonizer reviewed May 11, 2019

View reviewed changes

rongou added 4 commits May 13, 2019 11:01

only copy the model once when predicting over multiple batches

cf5f0be

exact InitModel() method

7fc4a2f

extract InitModel() method for the predictor

7501595

store more params in device shard

b141e7b

rongou force-pushed the copy-model-once branch from 7d651a5 to b141e7b Compare May 13, 2019 18:21

formatting

4aa2ad1

RAMitchell merged commit a9ec2dd into dmlc:master May 14, 2019

rongou deleted the copy-model-once branch May 15, 2019 16:37

hcho3 mentioned this pull request May 17, 2019

[RFC] Version 0.90 release candidate #4475

Merged

lock bot locked as resolved and limited conversation to collaborators Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only copy the model once when predicting multiple batches #4457

only copy the model once when predicting multiple batches #4457

rongou commented May 10, 2019

canonizer May 10, 2019

rongou May 10, 2019

sriramch May 10, 2019

rongou May 11, 2019

canonizer May 10, 2019

rongou May 11, 2019

canonizer May 10, 2019

rongou May 11, 2019

canonizer May 11, 2019

rongou May 13, 2019

canonizer May 10, 2019

rongou May 11, 2019

canonizer May 11, 2019

canonizer May 11, 2019

rongou May 13, 2019

canonizer May 11, 2019

rongou May 13, 2019

rongou commented May 14, 2019

hcho3 commented May 14, 2019

hcho3 commented May 14, 2019

rongou commented May 14, 2019

RAMitchell commented May 14, 2019

only copy the model once when predicting multiple batches #4457

only copy the model once when predicting multiple batches #4457

Conversation

rongou commented May 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rongou commented May 14, 2019

hcho3 commented May 14, 2019

hcho3 commented May 14, 2019

rongou commented May 14, 2019

RAMitchell commented May 14, 2019