[RFC] Unifying prediction API. #6632

trivialfis · 2021-01-22T11:54:34Z

Background

XGBoost has a number of prediction functions exposed on C API and various language bindings. Including prediction on DMatrix and inplace prediction. Inside these prediction functions, we also have a number of prediction types, including value, margin, leaf, contribs and interaction. The outputs of them have different meanings and shapes. Right now language bindings are responsible for figuring that out, which has became a burden since we have introduced dask interface on top of Python (#6614). Also, the output shape is quite complicated, I have difficult time on figuring out how to slice up the output array from pred_leaf. Aside from these, there are also different prediction parameters, including ntree_limit, n_layers, is_training, also a never used parameter tree_begin. Lastly if the prediction is carried inplace, some more information like missing and base_margin needs to be carried into implementation.

Requirments

We unify the prediction functions of C API in a consistent manner. The new prediction functions should be able to figure out the output shape for language bindings, and should be extensible to future feature addition. At the same time, we need to look into what are the parameters that we don't want, like ntree_limit. Since this is designing at C API level, we should try to comply to some C programming practices on API design.

At the same time, we are not near next major release (2.0), so old API should be kept for compatibility.

Proposal

Define a public prediction parameter struct:

enum PredictType {  // should not use enum, just for demo
  kValue,
  kMargin,
  kContribution,
  kInteraction,
  kLeaf
};

typedef struct _PredictParam {
  bool is_training;
  int32_t begin_iteration;
  int32_t end_iteration;
  PredictType type;

  // Unused if input is DMatrix.
  void* base_margin;
  int base_margin_shape;
  float missing;
} PredictParam;

Define a set of public functions with consistent semantic:

int XGBoosterPredictFromDMatrix(BoosterHandle booster, DMatrixHandle dmatrix, PredictParam param, bst_ulong **out_shape, bst_float **out_result);

int XGBoosterPredictFromDense(BoosterHandle booster, void* data, DataType type, int* shape, PredictParam param, bst_ulong **out_shape, bst_float **out_result);

...

The functions should output correct shape on out_shape parameter, and PredictParam will be responsible for future extensibility. Additionally we can cooperate more information into input and output, like device ordinal, data slicing etc. This RFC is for whether should we be carrying out this refactor.

Brief notes

Some more notes on the prediction function:

The output shape for contribution and interaction is not unified. It depends on whether multi-class is used.
Predict leaf is always returning 2 dim array, num class and forest are not considered.
When output_margin is set, the output array is still 1 dim vector, but it should be 2 dim matrix for multi-class.
The output shape for softmax and softprob are both 1 dim vector, should be make softprob output a 2 dim matrix?
Maybe we should add a parameter to let user choose a stricter output shape?

@hcho3 @RAMitchell

The text was updated successfully, but these errors were encountered:

hcho3 · 2021-01-23T01:37:43Z

@trivialfis

I have difficult time on figuring out how to slice up the output array from pred_leaf.

Have you managed to figure this out?

trivialfis · 2021-01-23T07:03:34Z

Yup, I wrote a test for predict leaf

trivialfis · 2021-01-23T09:01:32Z

So what do you think of the RFC?

hcho3 · 2021-01-23T09:27:01Z

@trivialfis Is this RFC connected to #6091?

trivialfis · 2021-01-23T09:48:42Z

No, this is just for the predict function, does not affect feature importance.

trivialfis · 2021-01-25T14:19:37Z

Some more notes on the prediction function:

The output shape for contribution and interaction is not unified. It depends on whether multi-class is used.
Predict leaf is always returning 2 dim array, num class and forest are not considered.
When output_margin is set, the output array is still 1 dim vector, but it should be 2 dim matrix for multi-class.
The output shape for softmax and softprob are both 1 dim vector, should be make softprob output a 2 dim matrix?
Maybe we should add a parameter to let user choose a stricter output shape?

Craigacp · 2021-01-25T17:18:21Z

Would these proposed functions be thread safe, or is that argument dependent?

trivialfis · 2021-01-25T17:39:22Z

The inplace prediction function is thread safe and lock free. Normal prediction is argument dependent currently, but should not be too difficult to make it fully thread safe.

Craigacp · 2021-01-25T18:26:03Z

Ok. I'm in favour of making it completely thread safe, as that's one of the things that's frustrating about the current XGBoost4J API. It would be nice to get rid of synchronized on the predict methods.

If it's possible to help do that without having to understand all the internals of the library then I'm happy to help, but I think the last time I looked it required understanding a lot of the internals to figure out the thread safety of a particular bit of code.

trivialfis · 2021-01-27T15:59:29Z

@Craigacp This PR should help gbtree: #6648

trivialfis mentioned this issue Jan 26, 2021

[WIP] Rework predict functions. #6638

Closed

trivialfis added the type: roadmap label Mar 30, 2021

trivialfis mentioned this issue Apr 1, 2021

[RFC] Deprecating ntree_limit in favor of best_iteration. #6621

Closed

WhoisZihan mentioned this issue Aug 1, 2021

What is the right performant C++ interface for online prediction? #7150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Unifying prediction API. #6632

[RFC] Unifying prediction API. #6632

trivialfis commented Jan 22, 2021 •

edited

Loading

hcho3 commented Jan 23, 2021

trivialfis commented Jan 23, 2021

trivialfis commented Jan 23, 2021

hcho3 commented Jan 23, 2021

trivialfis commented Jan 23, 2021

trivialfis commented Jan 25, 2021 •

edited

Loading

Craigacp commented Jan 25, 2021

trivialfis commented Jan 25, 2021

Craigacp commented Jan 25, 2021 •

edited

Loading

trivialfis commented Jan 27, 2021

[RFC] Unifying prediction API. #6632

[RFC] Unifying prediction API. #6632

Comments

trivialfis commented Jan 22, 2021 • edited Loading

Background

Requirments

Proposal

Brief notes

hcho3 commented Jan 23, 2021

trivialfis commented Jan 23, 2021

trivialfis commented Jan 23, 2021

hcho3 commented Jan 23, 2021

trivialfis commented Jan 23, 2021

trivialfis commented Jan 25, 2021 • edited Loading

Craigacp commented Jan 25, 2021

trivialfis commented Jan 25, 2021

Craigacp commented Jan 25, 2021 • edited Loading

trivialfis commented Jan 27, 2021

trivialfis commented Jan 22, 2021 •

edited

Loading

trivialfis commented Jan 25, 2021 •

edited

Loading

Craigacp commented Jan 25, 2021 •

edited

Loading