Prediction by indices (subsample < 1) #6683

RukhovichIV · 2021-02-05T09:04:08Z

Currently, the prediction cache is not enabled for models with subsample < 1, so a full prediction is made after each training iteration. This PR allows to partially update the predictions using the existing cache and make the actual prediction only on the rows that were not used for training.

We noticed an almost 2x acceleration for the PredictRaw section (on santander, where subsample == 0.5)
We also encountered a slowdown in the InitData section, where we now do more calculations (this will only be observed on subsample < 1 datasets). This could be improved in future updates.

UPD:
Here are some measurements:

Higgs, 1m, `subsample == 0.9`	PredictRaw	InitData	Overall time
current master branch, s	2.57	7.12	24.70
#6683, s	1.51	8.51	24.86
speedup, ratio	1.7x	0.84x	0.994x

Airline, 1m, `subsample == 0.9`	PredictRaw	InitData	Overall time
current master branch, s	37.48	7.46	81.47
#6683, s	8.17	8.67	51.71
speedup, ratio	4.59x	0.86x	1.58x

Mortgage, 9m, `subsample == 0.9`	PredictRaw	InitData	Overall time
current master branch, s	5.19	6.4	25.93
#6683, s	1.73	7.6	23.48
speedup, ratio	3x	0.84x	1.1x

Santander, `subsample == 0.5`	PredictRaw	InitData	Overall time
master just before #6696, s	79.11	14.08	163.41
current master branch, s	56.81	15.78	145.52
#6683, s	37.77	21.40	129.41
total speedup, ratio	2.1x	0.66x	1.26x

trivialfis

Can we reconsider this? This PR seems to be an optimization where subsample is used for hist, which is a limited use case, also it complicates and duplicates the existing code.

RukhovichIV · 2021-02-17T05:20:04Z

@trivialfis, we tried to simplify this code as much as possible. Now it seems to me that this is one of the shortest ways to do this optimization. But we still need to partially duplicate the code of PredictBatchByBlockOfRowsKernel(). The fact is that this function was created in order to make predictions on all trees at once (in the current iteration), because each row of the training sample must be used in each tree, and this is the fastest way to do this. But now (when subsample < 1) we have separate rowsets for each tree to make predictions, and that is why we have to process each tree separately. We tried to unify these branches as much as possible, but still some code is duplicated.
As for optimizing only hist - yes, at the moment the acceleration will be obtained only in it. Of course, we can do the same optimization for other methods in the future, but it seems like PredictRaw it is not the major problem in other methods. Let's first try to do this only for hist

trivialfis · 2021-02-25T05:59:56Z

Sorry for the late reply. I think this PR is a workaround for hist not having prediction caching for prediction when subsample is enabled. Feel free to correct me. But I think it's possible to have prediction cache even when subsample is used. So far that's what the GPU hist does.

RukhovichIV · 2021-03-01T01:35:22Z

I'm not really aware of what's going on in GPU hist, but here's what we're doing in CPU:
https://github.com/dmlc/xgboost/blob/master/src/tree/updater_quantile_hist.cc#L115
https://github.com/dmlc/xgboost/blob/master/src/tree/updater_quantile_hist.cc#L131
We simply skip the accumulated prediction cache when subsample < 1, despite the fact that we could partially use it. And this PR enables its use for hist.

We added unit tests for the new CPU prediction branch (as part of the UpdatePredictionCache test) and extended TestInitDataSampling for the new behaviour.

It looks like there's been some strange error in Travis (https://travis-ci.org/github/dmlc/xgboost/jobs/760854002#L18269). Should I re-push this to restart the check?

RAMitchell

Could you just ensure leaf_value_cache_ is up-to-date at the end of training (maybe here:

xgboost/src/tree/updater_quantile_hist.cc

Line 106 in a9b4a95

).

We want to avoid communicating internals of separate parts of the program as much as possible. If you do it this way none of the interface changes.

RukhovichIV · 2021-03-03T08:10:46Z

Could you please explain in more detail what you want us to check? I checked the code again - it looks like leaf_value_cache_ is not used at all. It can be easily removed and nothing will change - I think I can even remove it in this PR if there are no objections from your side.
As for checking for cache availability, we do such check here: https://github.com/RukhovichIV/xgboost/blob/prediction_by_indices/src/predictor/cpu_predictor.cc#L260
If UpdatePredictionCache returns false for some reason, then tree_begin will be less than tree_end and we will fall into the default prediction branch.

We want to avoid communicating internals of separate parts of the program as much as possible

We've been thinking about the better way to make such optimization and came to this. The fact is that we obtain the indices of the rows on which we need to make a prediction in InitData part of Updater. And it's logical to make the prediction in a specially designated part - PredictRaw. So that we must somehow transfer our array between these parts of the program. If you think there's a better solution for this, let's discuss it.

RAMitchell · 2021-03-03T10:54:49Z

Ah, I didnt notice leaf_value_cache_ is no longer used. In that case can you ensure row set collection is complete at the end of Update to contain all required rows?

The current design is problematic. It makes life harder around other parts of xgboost to serve a very specific optimisation case.

RAMitchell · 2021-03-04T21:27:35Z

To clarify, I don't mean just adding a check, I mean ensuring row_set_collection_ actually contains all rows at the end of training and removing unused_indices.

Sorry for the ambiguity.

RukhovichIV · 2021-03-05T08:01:03Z

ensuring row_set_collection_ actually contains all rows at the end of training

But how can we verify this? If subsample >= 1, then this is automatically true, as it always was, why do we need to check this? And if subsample < 1, then it will only contain about nrows * subsample rows that were randomly taken from all of them. What can we check here? If you mean adding unit tests for it - they have been added

and removing unused_indices

Do you object to this field? Yes, we could use row_set_collection_ to find this array, but we think it would be much slower than than creating such an array directly in InitSampling(). And it still won't save us from communication between Updater and Predictor, because this field is also contained in Updater.

RukhovichIV · 2021-03-05T08:09:51Z

It seems to me that the only way to avoid any communication between Updater and Predictor is to make predictions on these unused_indices right in the Updater (for example, at the end of UpdatePredictionCache()), but I think that this is not what we expect from the Updater class, and it will also lead to copying some parts of the existing code from Predictor.

RAMitchell · 2021-03-06T20:34:12Z

Yes that is what I want you to do. Prediction is not complicated on a single tree - give it a try and see what it looks like.

RAMitchell

Looks good in general, just a few comments.

src/tree/updater_quantile_hist.cc

RAMitchell · 2021-03-12T00:56:04Z

src/tree/updater_quantile_hist.cc

@@ -680,30 +683,63 @@ bool QuantileHistMaker::Builder<GradientSumT>::UpdatePredictionCache(
    }
  });

+  if (param_.subsample < 1.0f) {


Is it potentially easier to complete this step at the end of the update instead of in UpdatePredictionCache? That way we don't need p_last_fmat_mutable_ as we are guaranteed to have a valid DMatrix.

I don't think it will be easier. We already have other things in UpdatePredictionCache() needed for predicting, such as out_preds, the number of trees, and the number of the current tree. Moving them to Update() will also cost us a few extra fields in Updater. But we can probably remove const from the existing p_last_fmat_ field.

Ok that seems fair. I don't like this p_last_fmat_ business much - it's a raw pointer that we are making assumptions about. For example if someone made changes to other parts of the program it could be easy to invalidate its use in the updater in unpredictable situations. If you can think of a better way to handle this in general that could be a nice change for the future.

Thank you for sharing your vision. We are going to do some refactoring for hist in the nearest future. At first glance, we can replace raw pointers with smart ones. Next, perhaps, we will think about some hashing for DMatrix.

You should definitely chat with @trivialfis about hist refactoring and work on an RFC.

RAMitchell · 2021-03-13T00:37:32Z

src/tree/updater_quantile_hist.h

+    // tree rows that were not used for current training
+    std::vector<size_t> unused_rows_;
+    // feature vectors for subsampled prediction
+    std::vector<RegTree::FVec> feat_vecs_;


Do you get any performance benefit from keeping this as a member variable instead of creating locally? Avoid extra member variables if possible.

Yes, this field allows us not to allocate nthread*nfeatures units of memory on each UpdatePredicionCache() call. Here're the results for one of our datasets:

Santander, subsample == 0.5 UpdatePredictionCache, s Overall time, s

with feat_vecs_ as a member variable 42.7465 142.7305

with local allocations 49.3696 147.4053

trivialfis self-requested a review February 5, 2021 13:08

trivialfis reviewed Feb 7, 2021

View reviewed changes

ShvetsKS mentioned this pull request Feb 9, 2021

Fix perf gap in thread safe prediction #6696

Merged

RukhovichIV force-pushed the prediction_by_indices branch from 685dade to b90ffd3 Compare February 16, 2021 14:28

RukhovichIV marked this pull request as ready for review February 17, 2021 05:20

RukhovichIV changed the title ~~WIP: Prediction by indices (subsample < 1)~~ Prediction by indices (subsample < 1) Feb 17, 2021

RAMitchell requested changes Mar 2, 2021

View reviewed changes

Another implementation of predicting by indices

2351d54

RukhovichIV force-pushed the prediction_by_indices branch from b141563 to 2351d54 Compare March 11, 2021 09:27

Fixed omp parallel_for variable type

26055bc

RAMitchell reviewed Mar 12, 2021

View reviewed changes

Removed SparsePageView from Updater

6ba037c

RAMitchell approved these changes Mar 13, 2021

View reviewed changes

SmirnovEgorRu approved these changes Mar 15, 2021

View reviewed changes

RAMitchell merged commit 19a2c54 into dmlc:master Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction by indices (subsample < 1) #6683

Prediction by indices (subsample < 1) #6683

RukhovichIV commented Feb 5, 2021 •

edited

Loading

trivialfis left a comment

RukhovichIV commented Feb 17, 2021

trivialfis commented Feb 25, 2021

RukhovichIV commented Mar 1, 2021

RAMitchell left a comment

RukhovichIV commented Mar 3, 2021 •

edited

Loading

RAMitchell commented Mar 3, 2021

RAMitchell commented Mar 4, 2021

RukhovichIV commented Mar 5, 2021

RukhovichIV commented Mar 5, 2021

RAMitchell commented Mar 6, 2021

RAMitchell left a comment

RAMitchell Mar 12, 2021

RukhovichIV Mar 12, 2021

RAMitchell Mar 13, 2021

RukhovichIV Mar 15, 2021

RAMitchell Mar 16, 2021

RAMitchell Mar 13, 2021

RukhovichIV Mar 15, 2021

Santander, subsample == 0.5	UpdatePredictionCache, s	Overall time, s
with `feat_vecs_` as a member variable	42.7465	142.7305
with local allocations	49.3696	147.4053

Prediction by indices (subsample < 1) #6683

Prediction by indices (subsample < 1) #6683

Conversation

RukhovichIV commented Feb 5, 2021 • edited Loading

trivialfis left a comment

Choose a reason for hiding this comment

RukhovichIV commented Feb 17, 2021

trivialfis commented Feb 25, 2021

RukhovichIV commented Mar 1, 2021

RAMitchell left a comment

Choose a reason for hiding this comment

RukhovichIV commented Mar 3, 2021 • edited Loading

RAMitchell commented Mar 3, 2021

RAMitchell commented Mar 4, 2021

RukhovichIV commented Mar 5, 2021

RukhovichIV commented Mar 5, 2021

RAMitchell commented Mar 6, 2021

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Mar 12, 2021

Choose a reason for hiding this comment

RukhovichIV Mar 12, 2021

Choose a reason for hiding this comment

RAMitchell Mar 13, 2021

Choose a reason for hiding this comment

RukhovichIV Mar 15, 2021

Choose a reason for hiding this comment

RAMitchell Mar 16, 2021

Choose a reason for hiding this comment

RAMitchell Mar 13, 2021

Choose a reason for hiding this comment

RukhovichIV Mar 15, 2021

Choose a reason for hiding this comment

RukhovichIV commented Feb 5, 2021 •

edited

Loading

RukhovichIV commented Mar 3, 2021 •

edited

Loading