-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
only copy the model once when predicting multiple batches #4457
Conversation
src/predictor/gpu_predictor.cu
Outdated
dh::safe_cuda(cudaMemcpyAsync(dh::Raw(tree_group_), model.tree_info.data(), | ||
sizeof(int) * model.tree_info.size(), | ||
cudaMemcpyHostToDevice)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code would look better as a separate method. In my opinion, this looks more logical, and reduces the number of function parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/predictor/gpu_predictor.cu
Outdated
@@ -361,10 +364,14 @@ class GPUPredictor : public xgboost::Predictor { | |||
DeviceOffsets(batch.offset, batch.data.Size(), &device_offsets); | |||
batch.data.Reshard(GPUDistribution::Explicit(devices_, device_offsets)); | |||
|
|||
// TODO(rongou): only copy the model once for all the batches. | |||
if (batch_offset == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be hoisted out of the for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/predictor/gpu_predictor.cu
Outdated
dh::ExecuteIndexShards(&shards_, [&](int idx, DeviceShard& shard) { | ||
shard.InitModel(model, h_tree_segments, h_nodes); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving it outside of the loop. You won't need the conditional then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/predictor/gpu_predictor.cu
Outdated
void PredictInternal | ||
(const SparsePage& batch, const MetaInfo& info, | ||
HostDeviceVector<bst_float>* predictions, | ||
size_t tree_begin, size_t tree_end, int n_classes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need n_classes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed by the prediction kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it belongs to the model, could you move it to InitModel()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/predictor/gpu_predictor.cu
Outdated
@@ -361,10 +364,14 @@ class GPUPredictor : public xgboost::Predictor { | |||
DeviceOffsets(batch.offset, batch.data.Size(), &device_offsets); | |||
batch.data.Reshard(GPUDistribution::Explicit(devices_, device_offsets)); | |||
|
|||
// TODO(rongou): only copy the model once for all the batches. | |||
if (batch_offset == 0) { | |||
dh::ExecuteIndexShards(&shards_, [&](int idx, DeviceShard& shard) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the code processing the tree nodes above (lines 335-350) has to do with model initialization, consider moving it (together with calls to DeviceShard::InitModel()
) to a separate method of GPUPredictor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/predictor/gpu_predictor.cu
Outdated
void PredictInternal | ||
(const SparsePage& batch, const MetaInfo& info, | ||
HostDeviceVector<bst_float>* predictions, | ||
size_t tree_begin, size_t tree_end, int n_classes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it belongs to the model, could you move it to InitModel()
?
src/predictor/gpu_predictor.cu
Outdated
if (tree_end - tree_begin == 0) { return; } | ||
monitor_.StartCuda("DevicePredictInternal"); | ||
|
||
void InitModel(const gbm::GBTreeModel &model, size_t tree_begin, size_t tree_end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider placing &
consistently, i.e. const gbm::GBTreeModel&
, auto&
or const gbm::GBTreeModel &
, auto &
. In two of the InitModel()
methods, &
is placed differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/predictor/gpu_predictor.cu
Outdated
shard.PredictInternal(batch, dmat->Info(), out_preds, model, | ||
h_tree_segments, h_nodes, tree_begin, tree_end); | ||
shard.PredictInternal(batch, dmat->Info(), out_preds, tree_begin, tree_end, | ||
model.param.num_output_group); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all three of these parameters can be stored in the shard after InitModel()
.
However, I'll leave this up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@RAMitchell this is ready to merge. Thanks! |
@RAMitchell Do we want this in 0.90? |
@rongou @canonizer @sriramch Can you do me a favor and explain what this PR does? Is this a follow-up to #4284 (external memory with single GPU) and #4438 (external memory with multiple GPUs)? |
@hcho3 yes it's an optimization and some refactoring. In external memory mode when we are running prediction on multiple batches, we should only copy the model to GPU once instead of every batch. |
@hcho3 this is a fairly low impact change, up to you whether you want to include it. I will merge so as not to have prs sitting around. |
@canonizer @RAMitchell @sriramch