Plans for daal4py to support missing values in tree models #682
tpboudreau
started this conversation in
General
Replies: 1 comment
-
Hi @tpboudreau |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When training data contains missing values, XGBoost will construct trees that direct missing valued observations down whichever path produces the greater gain. Whenever this results in missing values joining the "yes" (or, "less than the breakpoint for the feature") branch, daal4py refuses to load the resulting model for prediction.
For example, using the csv training dataset in the zip file below and running this script:
results in the following error:
(The resulting tree model is also included in the zip file, showing that missing values occasionally join the "yes" observations)
d4p.zip
Since missing values are not uncommon in our data, this limitation reduces daal4py's usefulness in our environment.
Are there any plans for daal4py to support such models in the near future?
Thanks!
EDIT: I'm running daal4py 2021.2.3 (from conda-forge) on Linux, but I don't believe this is a platform specific issue.
Beta Was this translation helpful? Give feedback.
All reactions