[jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format #4727

cpfarrell · 2019-08-02T00:32:41Z

When training XGBoost models on Spark it is possible to set the value of "missing" as part of the parameters of the model (more description at https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values). If you save the model in the format needed to be loaded in other bindings (via https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#interact-with-other-bindings-of-xgboost) then this missing parameter gets dropped (in general parameters not being included in the model is discussed in #4104). If the native model is then loaded (either on another platform or even again into Spark) the absence of this missing parameter will cause predictions to be inaccurate. An especially confusing aspect of this is that the missing parameter in python is a property of the DMatrix and so is a property of the dataset fed to XGBoost, in Spark however it's part of the model's parameters and so a property of the model. It can then be easy to forget to set the parameter correctly when constructing your DMatrix in python since it seems like it would be already baked into the model.

I imagine including the value of the missing parameter along with the model is likely blocked by #3980 but would it be possible to add to the documentation page about dealing with missing values (https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values) that care needs to be taken to set the missing parameter correctly on the other side if saving the model in native format?

hcho3 · 2020-12-16T21:57:10Z

Resolved by #4805

trivialfis changed the title ~~Add documentation warning for saving Spark trained model with non-default missing value in native format~~ [jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format Aug 4, 2019

cpfarrell mentioned this issue Aug 30, 2019

[jvm-packages] Allow for bypassing spark missing value check #4805

Merged

hcho3 closed this as completed Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format #4727

[jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format #4727

cpfarrell commented Aug 2, 2019

hcho3 commented Dec 16, 2020 •

edited

Loading

[jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format #4727

[jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format #4727

Comments

cpfarrell commented Aug 2, 2019

hcho3 commented Dec 16, 2020 • edited Loading

hcho3 commented Dec 16, 2020 •

edited

Loading