XGBRegressor converter is maybe broken #371

victornoel · 2020-02-26T08:00:49Z

It seems there are problems with the XGBRegressor converter.

I've opened an issue at sklearn-onnx but they redirected me to here: onnx/sklearn-onnx#321

Here is a repro: https://gist.github.com/victornoel/06d6231f6276719ddba53cb381dfd468

I casted an int column to str because in my original dataset, we have categories made of numbers so I thought this could be related…

As we can see, the differences are huge: the predictions should be between 0 and 1 and the differences can be as big as 1 in my tests!

>>> import test
>>> test.test()
[0.9019426  0.91082716 0.91868186 0.94926846 1.0898147 ]
min(Y)-max(Y): 0 1

The text was updated successfully, but these errors were encountered:

victornoel · 2020-03-02T17:11:41Z

For the record, I tested latest git version of sklearn-onnx, onnxconverter-common and onnxmltools and the problem is still there…

xadupre · 2020-03-03T17:48:25Z

I was able to replicate. I'll have a look tomorrow.

xadupre · 2020-03-04T14:21:10Z

One issue comes from OneHotEncoder. By default, scikit-learn is using sparse but sparse are not supported yet in Onnxruntime even though they are defined in ONNX. Because Onnxruntime is using a dense representation, missing values are different in scikit-learn (nan) and dense matrices in onnxruntime (0). I suggest using OneHotEncoder(sparse=False) for the time being until sparse are supported in onnxruntime and fully supported in ONNX.

victornoel · 2020-03-04T17:10:21Z

@xadupre thank you for the investigation, using sparse=False does indeed fix the problem for me!

I have some questions:

would it make sense for sklearn-onnx to fail for OneHotEncoder that were built with sparse=True maybe to avoid any problem until this is supported?
what are the drawbacks of sparse? It feels to me like training a model takes much more time with sparse=False (x10)
is there an issue to track supporting sparse in onnxruntime?
you say "one issue", are there more ? 😄

xadupre · 2020-03-04T19:06:04Z

I'm hesitating. sklearn-onnx is being tested with Onnxruntime but it does not mean only Onnxruntime should run ONNX graphs, so failing for OneHotEncoder is not my favourite option. For the OneHotEncoder, there is usually no drawback as it saves much memory space and it is faster as many features are null and are not part of the computation. So, in this case, sparse is definitly better. There is an issue on onnx (onnx/onnx#2008), none in onnxruntime. Feel free to add one. I don't have a better option right now.

victornoel · 2020-03-05T13:10:37Z

@xadupre thank you, I created an issue there: microsoft/onnxruntime#3144

Let me close the current issue then.

victornoel changed the title ~~XGBRegressor is maybe broken~~ XGBRegressor converter is maybe broken Feb 26, 2020

victornoel mentioned this issue Feb 26, 2020

Predictions seem wrong a scikit pipeline with OneHotEncoder onnx/sklearn-onnx#321

Closed

xadupre mentioned this issue Mar 4, 2020

Fix xgboost converter #373

Merged

victornoel mentioned this issue Mar 5, 2020

Adding sparse matrix support (or failing them) microsoft/onnxruntime#3144

Closed

victornoel closed this as completed Mar 5, 2020

Ewan-Keith mentioned this issue Jun 29, 2021

Sklearn LR predictions differ from onnx predictions (example present in current documentation) onnx/sklearn-onnx#671

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBRegressor converter is maybe broken #371

XGBRegressor converter is maybe broken #371

victornoel commented Feb 26, 2020

victornoel commented Mar 2, 2020

xadupre commented Mar 3, 2020

xadupre commented Mar 4, 2020

victornoel commented Mar 4, 2020 •

edited

Loading

xadupre commented Mar 4, 2020

victornoel commented Mar 5, 2020

XGBRegressor converter is maybe broken #371

XGBRegressor converter is maybe broken #371

Comments

victornoel commented Feb 26, 2020

victornoel commented Mar 2, 2020

xadupre commented Mar 3, 2020

xadupre commented Mar 4, 2020

victornoel commented Mar 4, 2020 • edited Loading

xadupre commented Mar 4, 2020

victornoel commented Mar 5, 2020

victornoel commented Mar 4, 2020 •

edited

Loading