Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpret value errors for ordinal and categorical models #723

Closed
GStechschulte opened this issue Sep 19, 2023 · 0 comments · Fixed by #732
Closed

interpret value errors for ordinal and categorical models #723

GStechschulte opened this issue Sep 19, 2023 · 0 comments · Fixed by #732
Labels
bug good first issue If you want to contribute but are not sure where to get started, this issue is for you!

Comments

@GStechschulte
Copy link
Collaborator

interpret does not work with models (ordinal and categorical regression) whose predictions are a vector of some quantity, e.g., probabilities. This is because each value in the vector of probabilities is associated with a category / class. When the model's predictions are assigned to the summary dataframe, a value error is raised as the summary dataframe is expecting one prediction for each row.

For example:

length = [
    1.3, 1.32, 1.32, 1.4, 1.42, 1.42, 1.47, 1.47, 1.5, 1.52, 1.63, 1.65, 1.65, 1.65, 1.65,
    1.68, 1.7, 1.73, 1.78, 1.78, 1.8, 1.85, 1.93, 1.93, 1.98, 2.03, 2.03, 2.31, 2.36, 2.46,
    3.25, 3.28, 3.33, 3.56, 3.58, 3.66, 3.68, 3.71, 3.89, 1.24, 1.3, 1.45, 1.45, 1.55, 1.6,
    1.6, 1.65, 1.78, 1.78, 1.8, 1.88, 2.16, 2.26, 2.31, 2.36, 2.39, 2.41, 2.44, 2.56, 2.67,
    2.72, 2.79, 2.84
]
choice = [
    "I", "F", "F", "F", "I", "F", "I", "F", "I", "I", "I", "O", "O", "I", "F", "F",
    "I", "O", "F", "O", "F", "F", "I", "F", "I", "F", "F", "F", "F", "F", "O", "O",
    "F", "F", "F", "F", "O", "F", "F", "I", "I", "I", "O", "I", "I", "I", "F", "I",
    "O", "I", "I", "F", "F", "F", "F", "F", "F", "F", "O", "F", "I", "F", "F"
]

sex = ["Male"] * 32 + ["Female"] * 31
data = pd.DataFrame({"choice": choice, "length": length, "sex": sex})
data["choice"]  = pd.Categorical(
    data["choice"].map({"I": "Invertebrates", "F": "Fish", "O": "Other"}),
    ["Other", "Invertebrates", "Fish"],
    ordered=True
)

model = bmb.Model("choice ~ length + sex", data, family="categorical")
idata = model.fit(
    draws=1000, target_accept=0.95, random_seed=1234, chains=4
)

bmb.interpret.predictions(
    model=model,
    idata=idata,
    covariates="length"
)
ValueError: Expected a 1D array, got an array with shape (50, 3)

The ValueError is raised due to the 3 classes. For each sample in the new dataframe fed to the model, the model predicts a probability for each class. We could convert the xarray into a dataframe and perform a left join on this to ensure the data used to perform predictions is joined correctly with the predictions and uncertainty intervals.

In the example of bmb.interpret.predictions I envision the summary dataframe looking something like this

length sex choice_dim estimate
1.240000 Male Other 0.097136
1.240000 Male Invertebrates 0.638881
1.240000 Male Fish 0.263982
... ... ... ...
3.890000 Male Other 0.316271
3.890000 Male Invertebrates 0.002594
3.890000 Male Fish 0.681135

where each row (sample) has been duplicated two times due to the left join. The user can now analyse the predicted probabilities for each class when length = 1.24 and sex = Male. This summary df would then be passed to the plot_predictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue If you want to contribute but are not sure where to get started, this issue is for you!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant