Plot comparisons #684

GStechschulte · 2023-06-12T16:36:52Z

This draft PR introduces plot_comparisons, a function for comparing the predictions made by a fitted model for different contrasts while holding all other covariates constant or at a user defined value. Inspiration was taken from the great marginaleffects R package.

At a high level, plot_comparisons allows the modeller to define a contrast contrast_predictor and the covariate values to condition on conditional. If a user does not pass specific values (for either the contrast_predictor or conditional), then a default grid of values is computed. Thus, the comparison in predictions allows a modeller to "see through the eyes of the model", i.e., the comparison on the scale of the outcome. The comparison of predictions is computed using all chains and draws of the posterior.

Currently, plot_comparisons only allows a user to compare the predictions for 1 contrast level, i.e., how does the probability of survival change if a person moves from 1st to 3rd class given Age = 50 and Sex = [0, 1].

fig, ax = plot_comparison(
    model=titanic_model,
    idata=titanic_idata,
    contrast_predictor={"PClass": [1, 3]},
    conditional={"Age": [50], "SexCode": [0, 1]}
)

In the above example, the user defined a value for each covariate. However, default values are computed if the user does not provide any for conditional:

fig, ax = plot_comparison(
    model=titanic_model,
    idata=titanic_idata,
    contrast_predictor={"PClass": [1, 3]},
    conditional=["Age", "SexCode"]
)

Another example of default values being computed for a categorical contrast predictor and numerical conditional covariate:

fig, ax, comparisons_df, contrast_df, idata = plot_comparison(
    model=fish_model,
    idata=fish_idata,
    contrast_predictor="livebait",
    conditional="persons"
)

These examples, and further explanations can be found in the following notebook.

review-notebook-app · 2023-06-12T16:36:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

GStechschulte · 2023-06-15T05:48:42Z

To plot subplots, I have added an additional argument subplot_kwargs where the user can specify the main, group, and panel covariates. I believe this additional arg. is needed because with plot_comparisons, the user can pass their own values for the covariates in conditional as a dict. If we allowed the same subplot figure control as plot_cap, this would require the user to pass a nested dictionary and would result in more code needed to "unnest" the dictionary to access the desired key, value pairs.

For example, if a modeller wanted to only consider a given interval for bill_length_mm and unique species:

fig, ax = plot_comparison(
    model=penguin_model,
    idata=penguin_idata,
    contrast_predictor=["flipper_length_mm"],
    conditional={
        "bill_length_mm": np.arange(40, 50, 1), 
        "species": ["Adelie", "Chinstrap", "Gentoo"]
        },
    subplot_kwargs={"main": "bill_length_mm", "group": "species", "panel": "species"},
    fig_kwargs={"figsize": (10, 3), "sharey": True},
    legend=False
)

If I followed the same convention as plot_cap, this would require something like:

fig, ax = plot_comparison(
    model=penguin_model,
    idata=penguin_idata,
    contrast_predictor=["flipper_length_mm"],
    conditional={
        "main": {"bill_length_mm": np.arange(40, 50, 1)}, 
        "group": {"species": ["Adelie", "Chinstrap", "Gentoo"]}, 
        "panel": {"species": ["Adelie", "Chinstrap", "Gentoo"]}
    }
)

Now this raises the question if we should add the same subplot_kwargs to plot_cap to stay consistent?

GStechschulte · 2023-06-16T15:40:26Z

Commit 41d4565 adds the ability to compute multi-level contrast comparisons. This is achieved by first computing all pairwise orderings of the contrast value. Then, the xr.DataArray is indexed using each pair.

For example, contrast=prog has 3 values (General, Vocational, Academic). Each pairwise contrast is shown in the output below.

# if the user wants to compare > 2 levels. Use the comparison function directly
comparisons(
    model=model_interaction,
    idata=idata_interaction,
    contrast="prog",
    conditional="math"
)

math	term	contrast	estimate	hdi_3%	hdi_97%
1.000000	prog	['Academic', 'General']	5.705757	-2.780949	15.583213
1.000000	prog	['Academic', 'Vocational']	5.678390	-2.237369	15.903185
1.000000	prog	['General', 'Vocational']	5.651256	-2.166023	15.727247
...	...	...	...	...	...
99.000000	prog	['Academic', 'General']	-4.931466	-10.382592	-0.191101
99.000000	prog	['Academic', 'Vocational']	-4.909654	-10.369010	-0.098217
99.000000	prog	['General', 'Vocational']	-4.887939	-10.395197	-0.081602

To do before moving to a normal PR:

shape handling for comparisons where a user passes > 1 level
allow plotting of subplots (panels)
docstrings and type hints
optional return of contrast_df (a dataframe containing descriptive statistics of the contrast comparison) (user should use comparisons) if they want a dataframe returned
refactor plot_cap code so it works
plot comparisons of other model parameters? to be added in a later PR
add and run tests, and black

bambi/plots/__init__.py

bambi/plots/create_data.py

bambi/plots/effects.py

bambi/plots/plot_types.py

bambi/plots/plotting.py

bambi/plots/utils.py

bambi/plots/effects.py

GStechschulte · 2023-06-24T08:30:59Z

In addition to the requested changes by @tomicapretto, the latest commits in this PR added the following functionality:

average_by argument in comparisons and plot_comparisons
subplot_kwargs in plot_cap to follow the same design as plot_comparisons
organised the test_plots.py file into three classes: (1) TestCommon to test common args. of both plot_cap and plot_comparisons; usually regarding Matplotlib figure args, (2) TestCap tests args. specific to plot_cap, and (3) TestComparisons tests args. specific to plot_comparisons

Here, I give a brief example of average_by. For example:

fish_data = pd.read_stata("http://www.stata-press.com/data/r11/fish.dta")
cols = ["count", "livebait", "camper", "persons", "child"]
fish_data = fish_data[cols]
fish_data["livebait"] = fish_data["livebait"].astype("category")
fish_data["camper"] = fish_data["camper"].astype("category")

likelihood = bmb.Likelihood("ZeroInflatedPoisson", params=["mu", "psi"], parent="mu")
links = {"mu": "log", "psi": "logit"}
zip_family = bmb.Family("zip", likelihood, links)
priors = {"psi": bmb.Prior("Beta", alpha=3, beta=3)}

fish_model = bmb.Model(
    "count ~ livebait + camper + persons + child", 
    fish_data, 
    priors=priors,
    family=zip_family
)

fish_idata = fish_model.fit(draws=1000, target_accept=0.95, random_seed=1234, chains=4)

comparisons(
    model=fish_model,
    idata=fish_idata,
    contrast="camper",
    conditional=["livebait", "child", "persons"]
)

term	contrast	livebait	child	persons	estimate	hdi_0.03%	hdi_0.97%
camper	(0.0, 1.0)	0.0	0.0	1.0	0.185616	0.086099	0.291901
camper	(0.0, 1.0)	0.0	0.0	2.0	0.443242	0.212235	0.679702
camper	(0.0, 1.0)	0.0	0.0	4.0	2.542203	1.309796	3.867047
...	...	...	...	...	...	...	...
camper	(0.0, 1.0)	1.0	3.0	1.0	0.016145	0.006866	0.025888
camper	(0.0, 1.0)	1.0	3.0	2.0	0.038481	0.016707	0.060103
camper	(0.0, 1.0)	1.0	3.0	4.0	0.219881	0.112066	0.343412

A user can pass a covariate(s) they would like to average by. For example:

# marginalizes over child and persons
comparisons(
    model=fish_model,
    idata=fish_idata,
    contrast="camper",
    conditional=["livebait", "child", "persons"],
    average_by="livebait"
)

	term	contrast	livebait	estimate	hdi_0.03%	hdi_0.97%
camper	(0.0, 1.0)	0.0	0.445599	0.223418	0.683028
camper	(0.0, 1.0)	1.0	2.442100	1.830418	3.058506

Passing livebait to average_by averages by [0, 1] which marginalises over the other covariates child and persons to get the average estimate and uncertainty of livebait. This can also be plotted:

plot_comparison(
    model=fish_model,
    idata=fish_idata,
    contrast="camper",
    conditional=["livebait", "child", "persons"],
    average_by="livebait"
)

I will be adding a notebook for the docs explaining the functionality and how to use plot_comparisons in the coming week.

deleted docs/notebooks/plot_cap.ipynb#

…ontrasts 'average_by=True'

…ve unused contrast_dtype func

tomicapretto

The PR is 95% done. It's super clean and you wrote very high-quality code. Thanks a lot for that!

Most of the changes I'm requesting are "cosmetic" or docstring related changes. The only "big" thing is the updates needed to make sure aliases on non-parent parameters still work. Don't hesitate to ask if you want help here (either to build an example/test or to implement it)

bambi/plots/__init__.py

bambi/plots/create_data.py

bambi/plots/effects.py

bambi/plots/plot_types.py

bambi/plots/plotting.py

GStechschulte · 2023-07-06T13:46:18Z

The PR is 95% done. It's super clean and you wrote very high-quality code. Thanks a lot for that!

Most of the changes I'm requesting are "cosmetic" or docstring related changes. The only "big" thing is the updates needed to make sure aliases on non-parent parameters still work. Don't hesitate to ask if you want help here (either to build an example/test or to implement it)

This comment sneaked past me. Thanks a lot, and for the code reviews 👍🏼

…iased

tomicapretto

Looks like this is good to go!

@GStechschulte let me know if I can merge :)

GStechschulte · 2023-07-09T07:47:06Z

Looks like this is good to go!

@GStechschulte let me know if I can merge :)

Looks like we can merge 👍🏼 Again, thanks for all the code reviews and insights. Much appreciated!

GStechschulte mentioned this pull request Jun 13, 2023

Modularize bmb.plots sub-package #674

Closed

7 tasks

GStechschulte marked this pull request as ready for review June 19, 2023 18:36

tomicapretto requested changes Jun 22, 2023

View reviewed changes

tomicapretto reviewed Jun 22, 2023

View reviewed changes

bambi/plots/effects.py Outdated Show resolved Hide resolved

tomicapretto reviewed Jun 22, 2023

View reviewed changes

bambi/plots/effects.py Outdated Show resolved Hide resolved

GStechschulte force-pushed the plot-comparisons branch from cf68fd0 to a647b27 Compare June 24, 2023 06:39

GStechschulte requested a review from tomicapretto June 24, 2023 06:41

GStechschulte added 19 commits June 26, 2023 18:03

plot_cap draft outline for docs example

07533f8

intro. to GLMs and Negative Binomial model

4780d36

added logistic regression and other model params. demo

dcd3ceb

deleted docs/notebooks/plot_cap.ipynb#

basic linear model demo

8274c53

comparisons learning from marginaleffects

b844396

comparison contrasts using make_cap_data code

227d78d

CreateData class added to __init__.py

2769969

CreateData class for all plotting functions

7753a2b

functions for computing and plotting comparisons

e1952e3

plot_comparisons demo on categorical data

9c3c78d

logic of main, group, panel for building contrasts df

95d1ec1

add make_group_panel_values and enforce_dtypes functions

e4ec36d

plot_comparisons demo

0371be9

cleanup demo notebook

096958b

cleanup demo notebook

c2533fe

move util functions to utils.py and renaming of modules and functions

0115451

re-run demo.

deef5f3

use dataclass for returning covariates instead of dict

4360e79

remove unused variables in plot_comparison

d3334ce

improved OOP with dataclasses, error handling, and added unit-level c…

40ecc50

…ontrasts 'average_by=True'

GStechschulte marked this pull request as draft June 28, 2023 20:26

tomicapretto and others added 3 commits June 29, 2023 18:44

Allow predictions on new groups (bambinos#693)

57f1a75

ran black

dde435c

move isinstance logic to dataclass, improved error handling, and remo…

c303748

…ve unused contrast_dtype func

GStechschulte marked this pull request as ready for review June 29, 2023 16:46

resolve pylint message codes

503d5c1

tomicapretto requested changes Jun 29, 2023

View reviewed changes

GStechschulte added 4 commits July 2, 2023 14:09

remove imports that users should not have access to

879b05f

fix/add docstrings

a6fcdcc

fix/add docstrings and f-string attributes to ResponseInfo class

ba1918c

fix/add docstrings

ebd9d4d

GStechschulte mentioned this pull request Jul 3, 2023

Plot comparisons notebook for docs #695

Merged

tomicapretto and others added 5 commits July 5, 2023 19:06

Prepare 0.12.0 release (bambinos#694)

646ed7e

dev version

9861601

bug fix for building contrast_df when len(contrast values) > 3

ee4baf7

raise ValueError if user tries to plot with > 2 contrast values

b074df6

pylinter error solved, make contrast_df column ordering consistent

614b511

GStechschulte added 4 commits July 8, 2023 10:33

logic added for subsetting InferenceData when non-parent param. is al…

0b1b7a3

…iased

raise ValueError if only average_by=True when plotting comparisons

9ccd588

docstring describing indexing of last 3 columns

a616e60

added test for plot_cap non-parent parameter when there is an alias

84f32b3

GStechschulte requested a review from tomicapretto July 8, 2023 08:54

tomicapretto approved these changes Jul 8, 2023

View reviewed changes

tomicapretto merged commit 0fa0b6f into bambinos:main Jul 9, 2023

tomicapretto mentioned this pull request Jul 15, 2023

plot_cap default args. not working for categorical regression #673

Closed

3 tasks

GStechschulte mentioned this pull request Oct 10, 2023

interpret support for model predictions with response levels #732

Merged

5 tasks

GStechschulte deleted the plot-comparisons branch January 21, 2024 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plot comparisons #684

Plot comparisons #684

GStechschulte commented Jun 12, 2023 •

edited

Loading

review-notebook-app bot commented Jun 12, 2023

GStechschulte commented Jun 15, 2023 •

edited

Loading

GStechschulte commented Jun 16, 2023 •

edited

Loading

GStechschulte commented Jun 24, 2023

tomicapretto left a comment

GStechschulte commented Jul 6, 2023

tomicapretto left a comment •

edited

Loading

GStechschulte commented Jul 9, 2023

Plot comparisons #684

Plot comparisons #684

Conversation

GStechschulte commented Jun 12, 2023 • edited Loading

review-notebook-app bot commented Jun 12, 2023

GStechschulte commented Jun 15, 2023 • edited Loading

GStechschulte commented Jun 16, 2023 • edited Loading

GStechschulte commented Jun 24, 2023

tomicapretto left a comment

Choose a reason for hiding this comment

GStechschulte commented Jul 6, 2023

tomicapretto left a comment • edited Loading

Choose a reason for hiding this comment

GStechschulte commented Jul 9, 2023

GStechschulte commented Jun 12, 2023 •

edited

Loading

GStechschulte commented Jun 15, 2023 •

edited

Loading

GStechschulte commented Jun 16, 2023 •

edited

Loading

tomicapretto left a comment •

edited

Loading