`plot_cap` default args. not working for categorical regression #673

GStechschulte · 2023-05-20T14:05:57Z

This draft PR addresses issue #669 that adds additional logic in _plot_cap_numeric() of the plot_cap function to support indexing of N-dimensional y_hat_bounds to ensure ax.fill_between() works for $K$ outcome classes.

Before plotting begins in _plot_cap_numeric(), an additional variable y_hat_bounds_dim = y_hat_bounds.ndim is created to identify the number of dimensions. Then, when plotting is performed, an if / else statement is used to determine if ndims > 2. If yes, then we loop through each dimension and plot ax.fill_between(), else no for loop is needed.

The added logic allows ax.fill_between() to scale to $K$ classes, but it requires copy and paste of the if / else statement for each color panel combination inside of _plot_cap_numeric. Which in my opinion is kind of ugly and doesn't adhere to DRY. But it works.

Below are a few examples. I also identified that the class names are not specified in the legend due to the fact that the legend currently only looks in the covariates for the unique names.

To Do:

run tests
run black
fix legend to display outcome variable class names

import arviz as az
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import bambi as bmb
from bambi.plots import plot_cap

length = [
    1.3, 1.32, 1.32, 1.4, 1.42, 1.42, 1.47, 1.47, 1.5, 1.52, 1.63, 1.65, 1.65, 1.65, 1.65,
    1.68, 1.7, 1.73, 1.78, 1.78, 1.8, 1.85, 1.93, 1.93, 1.98, 2.03, 2.03, 2.31, 2.36, 2.46,
    3.25, 3.28, 3.33, 3.56, 3.58, 3.66, 3.68, 3.71, 3.89, 1.24, 1.3, 1.45, 1.45, 1.55, 1.6,
    1.6, 1.65, 1.78, 1.78, 1.8, 1.88, 2.16, 2.26, 2.31, 2.36, 2.39, 2.41, 2.44, 2.56, 2.67,
    2.72, 2.79, 2.84
]
choice = [
    "I", "F", "F", "F", "I", "F", "I", "F", "I", "I", "I", "O", "O", "I", "F", "F",
    "I", "O", "F", "O", "F", "F", "I", "F", "I", "F", "F", "F", "F", "F", "O", "O",
    "F", "F", "F", "F", "O", "F", "F", "I", "I", "I", "O", "I", "I", "I", "F", "I",
    "O", "I", "I", "F", "F", "F", "F", "F", "F", "F", "O", "F", "I", "F", "F"
]

sex = ["Male"] * 32 + ["Female"] * 31
data = pd.DataFrame({"choice": choice, "length": length, "sex": sex})
data["choice"]  = pd.Categorical(
    data["choice"].map({"I": "Invertebrates", "F": "Fish", "O": "Other"}),
    ["Other", "Invertebrates", "Fish"],
    ordered=True
)

fig, ax = plot_cap(
    model=model,
    idata=idata,
    covariates="length",
    pps=False,
    legend=True, # not working with response classes
)
fig.set_size_inches(7, 3)

See there is no legend for class names. Making it hard to distinguish the lines.

fig, ax = plot_cap(
    model=model,
    idata=idata,
    covariates={"horizontal": "length", "color": "sex", "panel": "sex"},
    fig_kwargs={"figsize": (16, 5), "sharey": True},
    pps=False
);

Again, there is no legend for class names. The legend correctly identifies the sex, but it would be more informative if the title was kept to the sex type and the legend indicated the class.

GStechschulte · 2023-05-22T14:04:51Z

The legend is now able to show the class / outcome name when the data type of the response variable is categorical. If the response variable is categorical, then the default legend=True bool is overwritten to a dict containing {response_name: <class names>}. Inside the plotting function, if the legend != bool, the dict key, values are used for the legend title and values.

Using the same code as above:

However, now we have the problem when specifying color and panel to sex, the color is held to C{i} which is itself determined based on the unique values of color. To solve this, we would also need to override the color each time in the loop or set a default colour mapping based on the unique class names (which is still problematic since the color is set and plotted sequentially in the for loop). Any thoughts on this?

tomicapretto · 2023-05-24T21:23:52Z

bambi/plots/plot_cap.py

@@ -262,12 +266,17 @@ def plot_cap(
 def _plot_cap_numeric(covariates, cap_data, y_hat_mean, y_hat_bounds, transforms, legend, axes):
    main = covariates.get("horizontal")
    transform_main = transforms.get(main, identity)
+    y_hat_bounds_dim = y_hat_bounds.ndim


Nit: Can we call it y_ndim or y_hat_bounds_ndim?

tomicapretto · 2023-05-24T21:35:07Z

@GStechschulte thanks for the great PR, as always.

To your last point "To solve this, we would also need to override the color each time in the loop or set a default colour mapping based on the unique class names (which is still problematic since the color is set and plotted sequentially in the for loop). Any thoughts on this?" I would say one approach would be to allow users to map the dimension of the response to the color or panel layer of the plot. By default, the color is mapped to the level of the response, when there are multiple response levels. But it can be overridden as you did in your example, and then it's up to the user to override things in a meaningful way. For example, I think it would make sense to ask for the response level to be mapped to the panel, in the case you want one panel per level. The question then is how this is specified in the covariates argument. I think one possible choice can be "{response_name}_dim"

So in your example one could be able to do

fig, ax = plot_cap(
    model=model,
    idata=idata,
    covariates={"horizontal": "length", "color": "sex", "panel": "choice_dim"},
    fig_kwargs={"figsize": (16, 5), "sharey": True},
    pps=False
);

or

fig, ax = plot_cap(
    model=model,
    idata=idata,
    covariates={"horizontal": "length", "color": "choice_dim", "panel": "sex"},
    fig_kwargs={"figsize": (16, 5), "sharey": True},
    pps=False
);

And by default the behavior would be as if you pass {"color": "choice_dim"}.

Do you think this makes sense? I'm not married to it of course.

GStechschulte · 2023-06-19T18:40:38Z

@GStechschulte thanks for the great PR, as always.

To your last point "To solve this, we would also need to override the color each time in the loop or set a default colour mapping based on the unique class names (which is still problematic since the color is set and plotted sequentially in the for loop). Any thoughts on this?" I would say one approach would be to allow users to map the dimension of the response to the color or panel layer of the plot. By default, the color is mapped to the level of the response, when there are multiple response levels. But it can be overridden as you did in your example, and then it's up to the user to override things in a meaningful way. For example, I think it would make sense to ask for the response level to be mapped to the panel, in the case you want one panel per level. The question then is how this is specified in the covariates argument. I think one possible choice can be "{response_name}_dim"

So in your example one could be able to do
fig, ax = plot_cap(
    model=model,
    idata=idata,
    covariates={"horizontal": "length", "color": "sex", "panel": "choice_dim"},
    fig_kwargs={"figsize": (16, 5), "sharey": True},
    pps=False
);
or
fig, ax = plot_cap(
    model=model,
    idata=idata,
    covariates={"horizontal": "length", "color": "choice_dim", "panel": "sex"},
    fig_kwargs={"figsize": (16, 5), "sharey": True},
    pps=False
);
And by default the behavior would be as if you pass {"color": "choice_dim"}.

Do you think this makes sense? I'm not married to it of course.

I will circle back to this when the core functionality of comparisons, predictions, and slopes is completed. If anyone else wants to take this PR on, feel free.

tomicapretto · 2023-06-22T12:49:58Z

I will circle back to this when the core functionality of comparisons, predictions, and slopes is completed. If anyone else wants to take this PR on, feel free.

I agree with the approach. Let's do one at at time. Thanks for being explicit by the way :)

tomicapretto · 2023-07-15T01:11:47Z

@GStechschulte after all the magic in #684, is this still needed?

GStechschulte · 2023-07-17T05:33:12Z

@tomicapretto I believe so. I did not change any of the code in plot_types.py in #684 which is where this bug now lives (previously it was directly in the plot_cap.py).

GStechschulte · 2023-09-20T14:01:37Z

Closing as this PR will no longer resolve the issue. See updated issue #723

GStechschulte added 5 commits May 20, 2023 15:39

add indexing for N-dim. y_hat_bounds for ax.fill_between()

f4c7d19

legend shows class names

2968285

legend now shows class names for categorical outcome. Ran Black.

029263f

_plot_cap_categorical show class names for categorical outcome

90835bd

_plot_cap_categorical show class names for categorical outcome

7a8aefd

GStechschulte marked this pull request as ready for review May 22, 2023 14:05

GStechschulte mentioned this pull request May 23, 2023

Modularize bmb.plots sub-package #674

Closed

7 tasks

tomicapretto reviewed May 24, 2023

View reviewed changes

GStechschulte closed this Sep 20, 2023

GStechschulte deleted the plot-cap-categorical branch January 21, 2024 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`plot_cap` default args. not working for categorical regression #673

`plot_cap` default args. not working for categorical regression #673

GStechschulte commented May 20, 2023 •

edited

Loading

GStechschulte commented May 22, 2023

tomicapretto May 24, 2023

tomicapretto commented May 24, 2023

GStechschulte commented Jun 19, 2023

tomicapretto commented Jun 22, 2023

tomicapretto commented Jul 15, 2023

GStechschulte commented Jul 17, 2023 •

edited

Loading

GStechschulte commented Sep 20, 2023

plot_cap default args. not working for categorical regression #673

plot_cap default args. not working for categorical regression #673

Conversation

GStechschulte commented May 20, 2023 • edited Loading

GStechschulte commented May 22, 2023

tomicapretto May 24, 2023

Choose a reason for hiding this comment

tomicapretto commented May 24, 2023

GStechschulte commented Jun 19, 2023

tomicapretto commented Jun 22, 2023

tomicapretto commented Jul 15, 2023

GStechschulte commented Jul 17, 2023 • edited Loading

GStechschulte commented Sep 20, 2023

`plot_cap` default args. not working for categorical regression #673

`plot_cap` default args. not working for categorical regression #673

GStechschulte commented May 20, 2023 •

edited

Loading

GStechschulte commented Jul 17, 2023 •

edited

Loading