-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBRegressor converter is maybe broken #371
Comments
For the record, I tested latest git version of sklearn-onnx, onnxconverter-common and onnxmltools and the problem is still there… |
I was able to replicate. I'll have a look tomorrow. |
One issue comes from OneHotEncoder. By default, scikit-learn is using sparse but sparse are not supported yet in Onnxruntime even though they are defined in ONNX. Because Onnxruntime is using a dense representation, missing values are different in scikit-learn (nan) and dense matrices in onnxruntime (0). I suggest using OneHotEncoder(sparse=False) for the time being until sparse are supported in onnxruntime and fully supported in ONNX. |
@xadupre thank you for the investigation, using I have some questions:
|
I'm hesitating. sklearn-onnx is being tested with Onnxruntime but it does not mean only Onnxruntime should run ONNX graphs, so failing for OneHotEncoder is not my favourite option. For the OneHotEncoder, there is usually no drawback as it saves much memory space and it is faster as many features are null and are not part of the computation. So, in this case, sparse is definitly better. There is an issue on onnx (onnx/onnx#2008), none in onnxruntime. Feel free to add one. I don't have a better option right now. |
@xadupre thank you, I created an issue there: microsoft/onnxruntime#3144 Let me close the current issue then. |
It seems there are problems with the
XGBRegressor
converter.I've opened an issue at sklearn-onnx but they redirected me to here: onnx/sklearn-onnx#321
Here is a repro: https://gist.github.com/victornoel/06d6231f6276719ddba53cb381dfd468
I casted an int column to str because in my original dataset, we have categories made of numbers so I thought this could be related…
As we can see, the differences are huge: the predictions should be between 0 and 1 and the differences can be as big as 1 in my tests!
The text was updated successfully, but these errors were encountered: