Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format #4727

Closed
cpfarrell opened this issue Aug 2, 2019 · 1 comment

Comments

@cpfarrell
Copy link
Contributor

When training XGBoost models on Spark it is possible to set the value of "missing" as part of the parameters of the model (more description at https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values). If you save the model in the format needed to be loaded in other bindings (via https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#interact-with-other-bindings-of-xgboost) then this missing parameter gets dropped (in general parameters not being included in the model is discussed in #4104). If the native model is then loaded (either on another platform or even again into Spark) the absence of this missing parameter will cause predictions to be inaccurate. An especially confusing aspect of this is that the missing parameter in python is a property of the DMatrix and so is a property of the dataset fed to XGBoost, in Spark however it's part of the model's parameters and so a property of the model. It can then be easy to forget to set the parameter correctly when constructing your DMatrix in python since it seems like it would be already baked into the model.

I imagine including the value of the missing parameter along with the model is likely blocked by #3980 but would it be possible to add to the documentation page about dealing with missing values (https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values) that care needs to be taken to set the missing parameter correctly on the other side if saving the model in native format?

@trivialfis trivialfis changed the title Add documentation warning for saving Spark trained model with non-default missing value in native format [jvm-package] Add documentation warning for saving Spark trained model with non-default missing value in native format Aug 4, 2019
@hcho3
Copy link
Collaborator

hcho3 commented Dec 16, 2020

Resolved by #4805

@hcho3 hcho3 closed this as completed Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants