-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction by indices (subsample < 1) #6683
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reconsider this? This PR seems to be an optimization where subsample is used for hist
, which is a limited use case, also it complicates and duplicates the existing code.
685dade
to
b90ffd3
Compare
@trivialfis, we tried to simplify this code as much as possible. Now it seems to me that this is one of the shortest ways to do this optimization. But we still need to partially duplicate the code of |
Sorry for the late reply. I think this PR is a workaround for |
I'm not really aware of what's going on in GPU hist, but here's what we're doing in CPU: We added unit tests for the new CPU prediction branch (as part of the It looks like there's been some strange error in Travis (https://travis-ci.org/github/dmlc/xgboost/jobs/760854002#L18269). Should I re-push this to restart the check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just ensure leaf_value_cache_
is up-to-date at the end of training (maybe here:
xgboost/src/tree/updater_quantile_hist.cc
Line 106 in a9b4a95
We want to avoid communicating internals of separate parts of the program as much as possible. If you do it this way none of the interface changes.
Could you please explain in more detail what you want us to check? I checked the code again - it looks like
We've been thinking about the better way to make such optimization and came to this. The fact is that we obtain the indices of the rows on which we need to make a prediction in |
Ah, I didnt notice The current design is problematic. It makes life harder around other parts of xgboost to serve a very specific optimisation case. |
To clarify, I don't mean just adding a check, I mean ensuring Sorry for the ambiguity. |
But how can we verify this? If
Do you object to this field? Yes, we could use |
It seems to me that the only way to avoid any communication between Updater and Predictor is to make predictions on these |
Yes that is what I want you to do. Prediction is not complicated on a single tree - give it a try and see what it looks like. |
b141563
to
2351d54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, just a few comments.
@@ -680,30 +683,63 @@ bool QuantileHistMaker::Builder<GradientSumT>::UpdatePredictionCache( | |||
} | |||
}); | |||
|
|||
if (param_.subsample < 1.0f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it potentially easier to complete this step at the end of the update instead of in UpdatePredictionCache
? That way we don't need p_last_fmat_mutable_ as we are guaranteed to have a valid DMatrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will be easier. We already have other things in UpdatePredictionCache()
needed for predicting, such as out_preds
, the number of trees, and the number of the current tree. Moving them to Update()
will also cost us a few extra fields in Updater. But we can probably remove const
from the existing p_last_fmat_
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that seems fair. I don't like this p_last_fmat_
business much - it's a raw pointer that we are making assumptions about. For example if someone made changes to other parts of the program it could be easy to invalidate its use in the updater in unpredictable situations. If you can think of a better way to handle this in general that could be a nice change for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing your vision. We are going to do some refactoring for hist
in the nearest future. At first glance, we can replace raw pointers with smart ones. Next, perhaps, we will think about some hashing for DMatrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should definitely chat with @trivialfis about hist
refactoring and work on an RFC.
// tree rows that were not used for current training | ||
std::vector<size_t> unused_rows_; | ||
// feature vectors for subsampled prediction | ||
std::vector<RegTree::FVec> feat_vecs_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you get any performance benefit from keeping this as a member variable instead of creating locally? Avoid extra member variables if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this field allows us not to allocate nthread*nfeatures
units of memory on each UpdatePredicionCache()
call. Here're the results for one of our datasets:
Santander, subsample == 0.5 | UpdatePredictionCache, s | Overall time, s |
---|---|---|
with feat_vecs_ as a member variable |
42.7465 | 142.7305 |
with local allocations | 49.3696 | 147.4053 |
Currently, the prediction cache is not enabled for models with
subsample < 1
, so a full prediction is made after each training iteration. This PR allows to partially update the predictions using the existing cache and make the actual prediction only on the rows that were not used for training.We noticed an almost 2x acceleration for the
PredictRaw
section (on santander, wheresubsample == 0.5
)We also encountered a slowdown in the
InitData
section, where we now do more calculations (this will only be observed onsubsample < 1
datasets). This could be improved in future updates.UPD:
Here are some measurements:
subsample == 0.9
subsample == 0.9
subsample == 0.9
subsample == 0.5