Predictions differ #8

Hoeze · 2023-05-19T14:02:57Z

ebm2onnx version: 1.3.0
Python version: 3.8
Operating System: Arch linux

Description

I would like to convert an interpret v0.2.7 model to ONNX for conserving it for the future.
However, the predictions that I get strongly differ from the original model:

>>> original_result
array([[0.98610258, 0.01389742],
       [0.99524099, 0.00475901],
       [0.99398961, 0.00601039],
       [0.99739259, 0.00260741]])
>>> onnx_result
[array([[0.9861026 , 0.01389742],
       [0.96940696, 0.03059301],
       [0.9602685 , 0.03973148],
       [0.98411477, 0.01588521]], dtype=float32)]

What I Did

My conversion script:

#!/usr/bin/env python3
# requires "interpret==0.2.7" "interpret_core==0.2.7" "ebm2onnx==1.3"
import onnx
import onnxruntime
import ebm2onnx

import pickle
import json

import numpy as np
import pandas as pd

with open("AbSplice_DNA.pkl", "rb") as fd:
    absplice_dna_model = pickle.load(fd)

print(json.dumps(dict(zip(absplice_dna_model.feature_names, absplice_dna_model.feature_types)), indent=2))

test_df = pd.read_parquet("test.parquet")

onnx_model = ebm2onnx.to_onnx(
    absplice_dna_model,
    ebm2onnx.get_dtype_from_pandas(test_df),
    predict_proba=True
)
onnx.save_model(onnx_model, 'ebm_model.onnx')
session = onnxruntime.InferenceSession('ebm_model.onnx')

original_result = absplice_dna_model.predict_proba(test_df)
print(original_result)
onnx_result = session.run(None, {k: np.asarray(v) for k, v in test_df.items()})
print(onnx_result)

Further, you can find all necessary files to reproduce my issue in the attached zip file:
onnx_test.zip

Any help would be highly appreciated!

MainRo · 2023-05-31T15:34:12Z

~~did you mean 3.1.0 instead of 1.3.0 for the ebm2onnx version?~~
ok I see in the script that it is indeed 1.3.0.

Do you have the result of "ebm2onnx.get_dtype_from_pandas(test_df)" or can you share the test set?

Hoeze · 2023-05-31T15:56:03Z

Hi @MainRo, thanks for looking into it!
Yes, please check above, the reproducible example is in the onnx_test.zip file

MainRo · 2023-06-27T08:08:02Z

I started to analyze the issue but did not find yet where it comes from. The scores associated with each term do not seem correct in the converted model.
Can you try to:

retrain with the same environment but disable the interactions
retrain with the latest version of interpretml (0.4.2) and ebm2onnx (3.1.1)

MainRo · 2023-06-27T08:50:32Z

@Hoeze forget my previous comment.

Can you check the type of the splice_site_is_expressed column when training the model? Especially, check that it is declared as an int and not a float.

This is a categorical column and the values in the dataframe are 0 or 1. The type of the column in the parquet file is integer, but I see that internally, ebm considers them as floats before doing the categorical encoding.

The difference comes from this feature, and it is probable that it is because at some point it is converted to a float.

When I change the type of this column to string and update the internal types of the ebm model, I have similar values between interpret and onnx.

Hoeze · 2023-08-09T20:22:31Z

Thanks a lot @MainRo, now this makes a lot of sense.
"splice_site_is_expressed" in the EBM gets converted from int -> float -> string -> int...
E.g. splice_site_is_expressed == 1 (int) -> 1.0 (float) -> "1.0" (string) -> 0 (int) 🤦

I manually fixed this in the onnx models using onnx-modifier.

Hoeze mentioned this issue May 19, 2023

initial ONNX support gagneurlab/absplice#13

Open

MainRo added the question Further information is requested label May 31, 2023

Hoeze closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions differ #8

Predictions differ #8

Hoeze commented May 19, 2023 •

edited

Loading

MainRo commented May 31, 2023 •

edited

Loading

Hoeze commented May 31, 2023

MainRo commented Jun 27, 2023

MainRo commented Jun 27, 2023 •

edited

Loading

Hoeze commented Aug 9, 2023 •

edited

Loading

Predictions differ #8

Predictions differ #8

Comments

Hoeze commented May 19, 2023 • edited Loading

Description

What I Did

MainRo commented May 31, 2023 • edited Loading

Hoeze commented May 31, 2023

MainRo commented Jun 27, 2023

MainRo commented Jun 27, 2023 • edited Loading

Hoeze commented Aug 9, 2023 • edited Loading

Hoeze commented May 19, 2023 •

edited

Loading

MainRo commented May 31, 2023 •

edited

Loading

MainRo commented Jun 27, 2023 •

edited

Loading

Hoeze commented Aug 9, 2023 •

edited

Loading