`interpret` value errors for ordinal and categorical models #723

GStechschulte · 2023-09-19T17:27:39Z

interpret does not work with models (ordinal and categorical regression) whose predictions are a vector of some quantity, e.g., probabilities. This is because each value in the vector of probabilities is associated with a category / class. When the model's predictions are assigned to the summary dataframe, a value error is raised as the summary dataframe is expecting one prediction for each row.

For example:

length = [
    1.3, 1.32, 1.32, 1.4, 1.42, 1.42, 1.47, 1.47, 1.5, 1.52, 1.63, 1.65, 1.65, 1.65, 1.65,
    1.68, 1.7, 1.73, 1.78, 1.78, 1.8, 1.85, 1.93, 1.93, 1.98, 2.03, 2.03, 2.31, 2.36, 2.46,
    3.25, 3.28, 3.33, 3.56, 3.58, 3.66, 3.68, 3.71, 3.89, 1.24, 1.3, 1.45, 1.45, 1.55, 1.6,
    1.6, 1.65, 1.78, 1.78, 1.8, 1.88, 2.16, 2.26, 2.31, 2.36, 2.39, 2.41, 2.44, 2.56, 2.67,
    2.72, 2.79, 2.84
]
choice = [
    "I", "F", "F", "F", "I", "F", "I", "F", "I", "I", "I", "O", "O", "I", "F", "F",
    "I", "O", "F", "O", "F", "F", "I", "F", "I", "F", "F", "F", "F", "F", "O", "O",
    "F", "F", "F", "F", "O", "F", "F", "I", "I", "I", "O", "I", "I", "I", "F", "I",
    "O", "I", "I", "F", "F", "F", "F", "F", "F", "F", "O", "F", "I", "F", "F"
]

sex = ["Male"] * 32 + ["Female"] * 31
data = pd.DataFrame({"choice": choice, "length": length, "sex": sex})
data["choice"]  = pd.Categorical(
    data["choice"].map({"I": "Invertebrates", "F": "Fish", "O": "Other"}),
    ["Other", "Invertebrates", "Fish"],
    ordered=True
)

model = bmb.Model("choice ~ length + sex", data, family="categorical")
idata = model.fit(
    draws=1000, target_accept=0.95, random_seed=1234, chains=4
)

bmb.interpret.predictions(
    model=model,
    idata=idata,
    covariates="length"
)

ValueError: Expected a 1D array, got an array with shape (50, 3)

The ValueError is raised due to the 3 classes. For each sample in the new dataframe fed to the model, the model predicts a probability for each class. We could convert the xarray into a dataframe and perform a left join on this to ensure the data used to perform predictions is joined correctly with the predictions and uncertainty intervals.

In the example of bmb.interpret.predictions I envision the summary dataframe looking something like this

length	sex	choice_dim	estimate
1.240000	Male	Other	0.097136
1.240000	Male	Invertebrates	0.638881
1.240000	Male	Fish	0.263982
...	...	...	...
3.890000	Male	Other	0.316271
3.890000	Male	Invertebrates	0.002594
3.890000	Male	Fish	0.681135

where each row (sample) has been duplicated two times due to the left join. The user can now analyse the predicted probabilities for each class when length = 1.24 and sex = Male. This summary df would then be passed to the plot_predictions.

The text was updated successfully, but these errors were encountered:

GStechschulte mentioned this issue Sep 19, 2023

plot_cap not working for categorical regression #669

Closed

GStechschulte added bug good first issue If you want to contribute but are not sure where to get started, this issue is for you! labels Sep 19, 2023

GStechschulte mentioned this issue Sep 20, 2023

plot_cap default args. not working for categorical regression #673

Closed

3 tasks

GStechschulte mentioned this issue Sep 28, 2023

interpret support for model predictions with response levels #732

Merged

5 tasks

GStechschulte closed this as completed in #732 Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`interpret` value errors for ordinal and categorical models #723

`interpret` value errors for ordinal and categorical models #723

GStechschulte commented Sep 19, 2023

interpret value errors for ordinal and categorical models #723

interpret value errors for ordinal and categorical models #723

Comments

GStechschulte commented Sep 19, 2023

`interpret` value errors for ordinal and categorical models #723

`interpret` value errors for ordinal and categorical models #723