fix gpu predictor when dmatrix is mismatched with model #4613

rongou · 2019-06-27T00:48:35Z

This should fix the problem #4594 is trying to enforce.

RAMitchell · 2019-06-27T00:53:16Z

This solves the crash but not the underlying issue right? If there is a mismatch between the model features and the test matrix the results would still be considered incorrect.

rongou · 2019-06-27T15:43:10Z

As I mentioned in the other PR, the existing CLI doesn't care about the missing last feature:

$ ../../xgboost mushroom.conf tree_method=gpu_hist
[10:31:18] 6541x126 matrix with 143902 entries loaded from agaricus.txt.train
[10:31:18] 1583x125 matrix with 34217 entries loaded from agaricus.txt.test
[10:31:19] [0]	test-error:0.015793	train-error:0.014524
[10:31:19] [1]	test-error:0.000000	train-error:0.001223

Note the train and test matrices have different number of features (I manually removed the last feature from test).

Algorithmically the number of features doesn't have to match exactly between the model and test dataset, right? If the test dataset really is missing the last feature, and the DMatrix MetaInfo is inferred from the dataset, it's just a mismatch in metadata, the predictions should still be correct. Right now the gpu predictor relies on the test data (dmat->Info().num_col_) to construct the model, but it should rely on the model itself (model.param.num_feature). Does that make sense?

RAMitchell · 2019-06-27T22:15:32Z

What happens if you remove the first feature and the indices of all the other features get bumped down one?

CodingCat · 2019-06-27T22:19:00Z

What happens if you remove the first feature and the indices of all the other features get bumped down one?

it's true but if we want to handle every bit of this scenario in XGBoost, essentially we need to synchronize not only feature number but also the order of features not only for prediction but also for training (which is a bit complex at my first glance)

RAMitchell · 2019-06-27T22:27:22Z

Yes there should also be some validation happening in the language bindings checking test features match model features using feature names if they are available. I believe this already happens in python. Do you store feature names or indices inside your model in Java land?

CodingCat · 2019-06-27T22:34:32Z

Yes there should also be some validation happening in the language bindings checking test features match model features using feature names if they are available. I believe this already happens in python. Do you store feature names or indices inside your model in Java land?

no, it only fetches from native layer...distributed scenario is bit tricky, say we have 2 workers and 3 features, the second feature only locates in the first worker,

I think the the current code will train the first worker with 3 features, the second worker with 2 features and sync the model across workers. But the model in the second worker is trained with the third feature shifted to the second (after 0.81)...(need more verification on this)

RAMitchell · 2019-06-27T22:44:38Z

Ok sounds like it might take some time for a fix. We should probably merge this for now to avoid a hard crash.

fix gpu predictor when dmatrix is mismatched with model

2615b36

RAMitchell merged commit 63ec956 into dmlc:master Jun 27, 2019

rongou deleted the fix-gpu-predictor branch July 18, 2019 21:13

lock bot locked as resolved and limited conversation to collaborators Oct 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gpu predictor when dmatrix is mismatched with model #4613

fix gpu predictor when dmatrix is mismatched with model #4613

rongou commented Jun 27, 2019

RAMitchell commented Jun 27, 2019

rongou commented Jun 27, 2019

RAMitchell commented Jun 27, 2019

CodingCat commented Jun 27, 2019 •

edited

Loading

RAMitchell commented Jun 27, 2019

CodingCat commented Jun 27, 2019 •

edited

Loading

RAMitchell commented Jun 27, 2019

fix gpu predictor when dmatrix is mismatched with model #4613

fix gpu predictor when dmatrix is mismatched with model #4613

Conversation

rongou commented Jun 27, 2019

RAMitchell commented Jun 27, 2019

rongou commented Jun 27, 2019

RAMitchell commented Jun 27, 2019

CodingCat commented Jun 27, 2019 • edited Loading

RAMitchell commented Jun 27, 2019

CodingCat commented Jun 27, 2019 • edited Loading

RAMitchell commented Jun 27, 2019

CodingCat commented Jun 27, 2019 •

edited

Loading

CodingCat commented Jun 27, 2019 •

edited

Loading