-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model builders API update #1320
Model builders API update #1320
Conversation
the only thing that i don't fully like - CamelCase on .GbtModel. Might be GBTModel would be better? |
fixed |
The rest looks good to me! |
Dear all, |
MacOS and Linux CI checks are failing. Is it okay? |
Thanks for noting this! @Alexsandruss, have you any ideas how to deal with it? |
NaN errors are not the only ones:
|
daal4py/mb/ModelBuilders.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snake case is used for file names in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -27,7 +27,7 @@ | |||
from .._utils import getFPType | |||
|
|||
|
|||
class GBTDAALBase(BaseEstimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement nan tag dispatching for base class:
def _more_tags(self):
return {"allow_nan": self.allow_nan_}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement nan tag dispatching for base class:
def _more_tags(self): return {"allow_nan": self.allow_nan_}
done
return pd.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t) | ||
|
||
|
||
def main(readcsv=pd_read_csv, method='defaultDense'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method is not used. Same for second example.
def main(readcsv=pd_read_csv, method='defaultDense'): | |
def main(readcsv=pd_read_csv): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method is not used. Same for second example.
done
/intelci: run |
@razdoburdin test_model_builders_xgboost/ test_model_builders_lightgbm are failing internally |
/intelci: run |
daal4py/mb/model_builders.py
Outdated
def __init__(self): | ||
pass | ||
|
||
def predict(self, X, fptype="float"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fptype
should be deduced from input data.
Co-authored-by: Alexander Andreev <alexander.andreev@intel.com>
/intelci: run |
@razdoburdin you have changed examples names, removed suffix |
req_library['model_builders_lightgbm.py'] = ['lightgbm'] | ||
req_library['model_builders_xgboost.py'] = ['xgboost'] | ||
req_library['model_builders_catboost.py'] = ['catboost'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename your examples files and update runner script
req_library['model_builders_lightgbm.py'] = ['lightgbm'] | |
req_library['model_builders_xgboost.py'] = ['xgboost'] | |
req_library['model_builders_catboost.py'] = ['catboost'] | |
req_library['model_builders_lightgbm_batch.py'] = ['lightgbm'] | |
req_library['model_builders_xgboost_batch.py'] = ['xgboost'] | |
req_library['model_builders_catboost_batch.py'] = ['catboost'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Batch is strange and miningless suffix. If runner will not pick up them, then launcher should be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's decide how we can group-by this kind of examples then.
All examples scikit-learn-intelex/examples/ are all grouped logically into 4 groups: spmd.py
, streaming.py
, stream.py
, batch.py
.
If none of them is suitable, then:
- we will come up with a new suffix for such examples.
- update whole examples runner logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just made a brief look in the run_examples.py. May we just add some sort of default behavior in case the filename doesn't end on *spmd.py
, *streaming.py
, *stream.py
or *batch.py
?
As far as I see we could switch to this default behavior for all the *_batch.py
and *_stream.py
files. In my mind, it will make naming of examples more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@razdoburdin i like that idea more, than using suffixes.
@samir-nasibli - those are logical only if you look on this from some testing perspective. From end user perspective this looks strange.
I would go with base examples without batch because it make no sense to specify this. On stream/streaming - looks like this should have be single group. And for spmd - let's leave it as is
@Alexsandruss, @KulikovNikita - your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just made a brief look in the run_examples.py. May we just add some sort of default behavior in case the filename doesn't end on
*spmd.py
,*streaming.py
,*stream.py
or*batch.py
?
Sure we can. That means that we need update examples runner logic little bit (second option).
I want to start discussion about changing the Model builders API.
The main challenge is support of both scikit-learn style estimators and non-scikit style booster objects.
In the presented PR I added a new class
GBTDAALModel
for python interface in non-scikit style. It allow to hide all worldly syntax from the old API.In case of using
XGBoost
, for regression tasks user-side code changes from:d4p_model = daal4py.get_gbt_model_from_xgboost(booster)
d4p_prediction = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction
to
d4p_model = daal4py.mb.convert_model(booster)
d4p_prediction =d4p_model.predict()
In case of classification problem user-side code changes from:
d4p_model = daal4py.get_gbt_model_from_xgboost(booster)
d4p_prediction=daal4py.gbt_classification_prediction(nClasses=n_classes).compute(X_test, d4p_model).prediction
to
d4p_model = daal4py.mb.convert_model(booster)
d4p_prediction = d4p_model.predict()
For support of scikit-style estimators, I updated
GBTDAALClassifier
andGBTDAALRegressor
classes.One can use them like this (example for
XGBoost
, regression task):from daal4py.sklearn.ensemble import GBTDAALRegressor
reg = xgb.XGBRegressor()
reg.fit(X, y)
d4p_predt = GBTDAALRegressor.convert_model(reg).predict(X)