Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix gpu predictor when dmatrix is mismatched with model #4613

Merged
merged 1 commit into from
Jun 27, 2019

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Jun 27, 2019

This should fix the problem #4594 is trying to enforce.

@RAMitchell @CodingCat

@RAMitchell
Copy link
Member

This solves the crash but not the underlying issue right? If there is a mismatch between the model features and the test matrix the results would still be considered incorrect.

@rongou
Copy link
Contributor Author

rongou commented Jun 27, 2019

As I mentioned in the other PR, the existing CLI doesn't care about the missing last feature:

$ ../../xgboost mushroom.conf tree_method=gpu_hist
[10:31:18] 6541x126 matrix with 143902 entries loaded from agaricus.txt.train
[10:31:18] 1583x125 matrix with 34217 entries loaded from agaricus.txt.test
[10:31:19] [0]	test-error:0.015793	train-error:0.014524
[10:31:19] [1]	test-error:0.000000	train-error:0.001223

Note the train and test matrices have different number of features (I manually removed the last feature from test).

Algorithmically the number of features doesn't have to match exactly between the model and test dataset, right? If the test dataset really is missing the last feature, and the DMatrix MetaInfo is inferred from the dataset, it's just a mismatch in metadata, the predictions should still be correct. Right now the gpu predictor relies on the test data (dmat->Info().num_col_) to construct the model, but it should rely on the model itself (model.param.num_feature). Does that make sense?

@RAMitchell
Copy link
Member

What happens if you remove the first feature and the indices of all the other features get bumped down one?

@CodingCat
Copy link
Member

CodingCat commented Jun 27, 2019

What happens if you remove the first feature and the indices of all the other features get bumped down one?

it's true but if we want to handle every bit of this scenario in XGBoost, essentially we need to synchronize not only feature number but also the order of features not only for prediction but also for training (which is a bit complex at my first glance)

@RAMitchell
Copy link
Member

Yes there should also be some validation happening in the language bindings checking test features match model features using feature names if they are available. I believe this already happens in python. Do you store feature names or indices inside your model in Java land?

@CodingCat
Copy link
Member

CodingCat commented Jun 27, 2019

Yes there should also be some validation happening in the language bindings checking test features match model features using feature names if they are available. I believe this already happens in python. Do you store feature names or indices inside your model in Java land?

no, it only fetches from native layer...distributed scenario is bit tricky, say we have 2 workers and 3 features, the second feature only locates in the first worker,

I think the the current code will train the first worker with 3 features, the second worker with 2 features and sync the model across workers. But the model in the second worker is trained with the third feature shifted to the second (after 0.81)...(need more verification on this)

@RAMitchell
Copy link
Member

Ok sounds like it might take some time for a fix. We should probably merge this for now to avoid a hard crash.

@RAMitchell RAMitchell merged commit 63ec956 into dmlc:master Jun 27, 2019
@rongou rongou deleted the fix-gpu-predictor branch July 18, 2019 21:13
@lock lock bot locked as resolved and limited conversation to collaborators Oct 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants