Model builders API update #1320

razdoburdin · 2023-06-06T12:59:23Z

I want to start discussion about changing the Model builders API.
The main challenge is support of both scikit-learn style estimators and non-scikit style booster objects.
In the presented PR I added a new class GBTDAALModel for python interface in non-scikit style. It allow to hide all worldly syntax from the old API.

In case of using XGBoost, for regression tasks user-side code changes from:
d4p_model = daal4py.get_gbt_model_from_xgboost(booster)
d4p_prediction = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction
to
d4p_model = daal4py.mb.convert_model(booster)
d4p_prediction =d4p_model.predict()

In case of classification problem user-side code changes from:
d4p_model = daal4py.get_gbt_model_from_xgboost(booster)
d4p_prediction=daal4py.gbt_classification_prediction(nClasses=n_classes).compute(X_test, d4p_model).prediction
to
d4p_model = daal4py.mb.convert_model(booster)
d4p_prediction = d4p_model.predict()

For support of scikit-style estimators, I updated GBTDAALClassifier and GBTDAALRegressor classes.
One can use them like this (example for XGBoost, regression task):
from daal4py.sklearn.ensemble import GBTDAALRegressor
reg = xgb.XGBRegressor()
reg.fit(X, y)
d4p_predt = GBTDAALRegressor.convert_model(reg).predict(X)

napetrov · 2023-06-06T13:29:41Z

the only thing that i don't fully like - CamelCase on .GbtModel. Might be GBTModel would be better?

razdoburdin · 2023-06-06T14:01:01Z

the only thing that i don't fully like - CamelCase on .GbtModel. Might be GBTModel would be better?

fixed

src/gbt_model_builder.pyx

inteldimitrius · 2023-06-07T17:34:21Z

The rest looks good to me!

src/gbt_model_builder.pyx

razdoburdin · 2023-06-12T12:41:00Z

Dear all,
I have updated the PR. Currently, the scikit-style estimators are supported. Please see the updated description for details.

inteldimitrius · 2023-06-12T14:52:29Z

MacOS and Linux CI checks are failing. Is it okay?

razdoburdin · 2023-06-13T08:07:56Z

Thanks for noting this!
I have reproduced the test failure locally. The problem is in NAN test in predict() methods for estimators. As far as NAN are now supported, I removed the NAN test. But the sklearn.check_estimator requires this check. From another hand XGBRegressor supports NAN also and passes this check.

@Alexsandruss, have you any ideas how to deal with it?

Alexsandruss · 2023-06-13T14:45:08Z

NaN errors are not the only ones:

======================================================================
ERROR: test_gbt_cls_model_create_from_catboost_batch (test_examples.TestExNpyArray)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vsts/work/1/s/tests/test_examples.py", line 96, in testit
    result = self.call(ex)
  File "/home/vsts/work/1/s/tests/test_examples.py", line 269, in call
    return ex.main(readcsv=np_read_csv)
  File "/home/vsts/work/1/s/examples/daal4py/gbt_cls_model_create_from_catboost_batch.py", line 73, in main
    daal_prediction = daal_predict_algo.compute(X_test, daal_model)
TypeError: Argument 'model' has incorrect type (expected daal4py._daal4py.gbt_classification_model, got tuple)

Alexsandruss · 2023-07-10T16:38:37Z

daal4py/mb/ModelBuilders.py

Snake case is used for file names in python

Alexsandruss · 2023-07-10T16:44:26Z

daal4py/sklearn/ensemble/GBTDAAL.py

@@ -27,7 +27,7 @@
 from .._utils import getFPType


-class GBTDAALBase(BaseEstimator):


Implement nan tag dispatching for base class:

def _more_tags(self): return {"allow_nan": self.allow_nan_}

Implement nan tag dispatching for base class:

def _more_tags(self): return {"allow_nan": self.allow_nan_}

done

Alexsandruss · 2023-07-10T16:49:25Z

examples/sklearnex/model_builders_xgboost.py

+    return pd.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t)
+
+
+def main(readcsv=pd_read_csv, method='defaultDense'):


Method is not used. Same for second example.

Suggested change

def main(readcsv=pd_read_csv, method='defaultDense'):

def main(readcsv=pd_read_csv):

Method is not used. Same for second example.

done

razdoburdin · 2023-07-11T10:27:53Z

/intelci: run

napetrov · 2023-07-11T15:26:50Z

@razdoburdin test_model_builders_xgboost/ test_model_builders_lightgbm are failing internally

napetrov · 2023-07-12T10:33:54Z

/intelci: run

src/gbt_convertors.pyx

Alexsandruss · 2023-07-12T11:09:24Z

daal4py/mb/model_builders.py

+    def __init__(self):
+        pass
+
+    def predict(self, X, fptype="float"):


fptype should be deduced from input data.

Co-authored-by: Alexander Andreev <alexander.andreev@intel.com>

napetrov · 2023-07-12T13:54:01Z

/intelci: run

samir-nasibli · 2023-07-12T19:42:52Z

@razdoburdin you have changed examples names, removed suffix _batch. Examples runner will ignore this such file names https://github.com/intel/scikit-learn-intelex/blob/master/tests/run_examples.py#L219:L222 .

samir-nasibli · 2023-07-12T19:59:08Z

tests/run_examples.py

+req_library['model_builders_lightgbm.py'] = ['lightgbm']
+req_library['model_builders_xgboost.py'] = ['xgboost']
+req_library['model_builders_catboost.py'] = ['catboost']


Please rename your examples files and update runner script

Suggested change

req_library['model_builders_lightgbm.py'] = ['lightgbm']

req_library['model_builders_xgboost.py'] = ['xgboost']

req_library['model_builders_catboost.py'] = ['catboost']

req_library['model_builders_lightgbm_batch.py'] = ['lightgbm']

req_library['model_builders_xgboost_batch.py'] = ['xgboost']

req_library['model_builders_catboost_batch.py'] = ['catboost']

No. Batch is strange and miningless suffix. If runner will not pick up them, then launcher should be fixed

Let's decide how we can group-by this kind of examples then.
All examples scikit-learn-intelex/examples/ are all grouped logically into 4 groups: spmd.py, streaming.py, stream.py, batch.py.
If none of them is suitable, then:

we will come up with a new suffix for such examples.

update whole examples runner logic

I have just made a brief look in the run_examples.py. May we just add some sort of default behavior in case the filename doesn't end on *spmd.py, *streaming.py, *stream.py or *batch.py?
As far as I see we could switch to this default behavior for all the *_batch.py and *_stream.py files. In my mind, it will make naming of examples more clear.

@razdoburdin i like that idea more, than using suffixes.

@samir-nasibli - those are logical only if you look on this from some testing perspective. From end user perspective this looks strange.

I would go with base examples without batch because it make no sense to specify this. On stream/streaming - looks like this should have be single group. And for spmd - let's leave it as is

@Alexsandruss, @KulikovNikita - your thoughts?

I have just made a brief look in the run_examples.py. May we just add some sort of default behavior in case the filename doesn't end on *spmd.py, *streaming.py, *stream.py or *batch.py?

Sure we can. That means that we need update examples runner logic little bit (second option).

initial

b5ad70e

razdoburdin added the model builders label Jun 6, 2023

razdoburdin requested a review from Alexsandruss as a code owner June 6, 2023 12:59

razdoburdin changed the title ~~initial~~ Model builders API update Jun 6, 2023

napetrov requested review from ahuber21 and KulikovNikita June 6, 2023 13:42

linting

a2249dc

napetrov requested review from icfaust, inteldimitrius and Vika-F June 6, 2023 14:13

Alexsandruss requested changes Jun 6, 2023

View reviewed changes

razdoburdin marked this pull request as draft June 9, 2023 13:58

make API sklearn-like

db86ffc

napetrov reviewed Jun 9, 2023

View reviewed changes

src/gbt_model_builder.pyx Outdated Show resolved Hide resolved

napetrov requested a review from Alexsandruss June 9, 2023 14:26

Dmitry Razdoburdin added 3 commits June 12, 2023 01:56

fixes for classification cases

837f406

bug fixes

721bbc7

pep8

f448124

razdoburdin marked this pull request as ready for review June 12, 2023 12:39

razdoburdin requested a review from samir-nasibli as a code owner June 12, 2023 12:39

Merge branch 'intel:master' into model_builder_api_proposal

d70f9ec

razdoburdin mentioned this pull request Jun 13, 2023

Remove check for NAN for GBT Estimators #1330

Closed

razdoburdin marked this pull request as draft June 15, 2023 08:57

Dmitry Razdoburdin added 3 commits July 10, 2023 07:20

replace imports

da20c88

again

04055b7

next try

ba8226a

Alexsandruss requested changes Jul 10, 2023

View reviewed changes

Dmitry Razdoburdin added 5 commits July 10, 2023 23:47

some minor changes

258093b

pep8

b60ec5b

change

6f8f1f3

fix installation

ded7062

move GBTDAALBaseModel to mb namespace

02cd420

napetrov requested review from Alexsandruss and napetrov July 11, 2023 15:22

remove model builder examples for sklearn

ba62d0c

Alexsandruss reviewed Jul 12, 2023

View reviewed changes

razdoburdin and others added 3 commits July 12, 2023 13:49

Update src/gbt_convertors.pyx

b883491

Co-authored-by: Alexander Andreev <alexander.andreev@intel.com>

determine fptype from data type in GBTDAALModel.predict()

20f439a

pep8

caec612

napetrov requested a review from Alexsandruss July 12, 2023 14:11

Alexsandruss approved these changes Jul 12, 2023

View reviewed changes

napetrov merged commit 12b963a into uxlfoundation:master Jul 12, 2023

samir-nasibli reviewed Jul 12, 2023

View reviewed changes

samir-nasibli mentioned this pull request Jul 12, 2023

MAINT: fixing mb examples names #1360

Closed

razdoburdin deleted the model_builder_api_proposal branch July 13, 2023 08:13

razdoburdin mentioned this pull request Jul 14, 2023

Remove the batch suffix from examples #1364

Merged

This was referenced Aug 18, 2023

[doc] Update daal4py #1407

Merged

Update daal4py IntelPython/intelpython.github.io#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model builders API update #1320

Model builders API update #1320

razdoburdin commented Jun 6, 2023 •

edited

Loading

napetrov commented Jun 6, 2023

razdoburdin commented Jun 6, 2023

inteldimitrius commented Jun 7, 2023

razdoburdin commented Jun 12, 2023

inteldimitrius commented Jun 12, 2023

razdoburdin commented Jun 13, 2023

Alexsandruss commented Jun 13, 2023

Alexsandruss Jul 10, 2023

razdoburdin Jul 11, 2023

Alexsandruss Jul 10, 2023

razdoburdin Jul 11, 2023

Alexsandruss Jul 10, 2023

razdoburdin Jul 11, 2023

razdoburdin commented Jul 11, 2023

napetrov commented Jul 11, 2023

napetrov commented Jul 12, 2023

Alexsandruss Jul 12, 2023

napetrov commented Jul 12, 2023

samir-nasibli commented Jul 12, 2023 •

edited

Loading

samir-nasibli Jul 12, 2023

napetrov Jul 12, 2023

samir-nasibli Jul 13, 2023

razdoburdin Jul 13, 2023

napetrov Jul 13, 2023

samir-nasibli Jul 13, 2023

		@@ -27,7 +27,7 @@
		from .._utils import getFPType


		class GBTDAALBase(BaseEstimator):

		return pd.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t)


		def main(readcsv=pd_read_csv, method='defaultDense'):

	def main(readcsv=pd_read_csv, method='defaultDense'):
	def main(readcsv=pd_read_csv):

Model builders API update #1320

Model builders API update #1320

Conversation

razdoburdin commented Jun 6, 2023 • edited Loading

napetrov commented Jun 6, 2023

razdoburdin commented Jun 6, 2023

inteldimitrius commented Jun 7, 2023

razdoburdin commented Jun 12, 2023

inteldimitrius commented Jun 12, 2023

razdoburdin commented Jun 13, 2023

Alexsandruss commented Jun 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razdoburdin commented Jul 11, 2023

napetrov commented Jul 11, 2023

napetrov commented Jul 12, 2023

Choose a reason for hiding this comment

napetrov commented Jul 12, 2023

samir-nasibli commented Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razdoburdin commented Jun 6, 2023 •

edited

Loading

samir-nasibli commented Jul 12, 2023 •

edited

Loading