Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting conditional posteriors #41

Open
ozika opened this issue Jan 19, 2023 · 8 comments
Open

Plotting conditional posteriors #41

ozika opened this issue Jan 19, 2023 · 8 comments

Comments

@ozika
Copy link

ozika commented Jan 19, 2023

Hi! Thanks for creating this great package :)

I think one important aspect of understanding models is the ability to explore conditional posteriors. In the tutorial you mention the kind="ice" option, however, it is unclear how this can be used to systematically understand the model posterior. In arviz I'd for example use the plot_posterior() with the filter_vars argument to explore interactions. Is there a similar way in pmc.plot_dependance()? Or, can one easily use arviz with estimated InferenceData object?

I think this would be a very handy addition to the documentation. Once I understand it I'd be happy to write an example.

@aloctavodia
Copy link
Member

Hi, thanks for getting in contact and offering help.

This is a good reference for ICE (and other methods) https://christophm.github.io/interpretable-ml-book/ice.html

plot_dependence has a var_idx argument that you can use to exclude variables by index. We may extend it to work with variable names, instead.

We can use ArviZ and the returned InferenceData in general (like for sampling diagnostics), but for the PDP and ICE plots we need to compute new predictions and for that, we need the fitted trees that are stored in the BART variable and not in the InferenceData.

@ozika
Copy link
Author

ozika commented Jan 20, 2023

Thanks for your response!

I am not sure if the var_idx filtering is what I mean. For example, one might want to plot temperature impact estimates separately for working day and weekend (to explore bike_rentals ~ temperature*workingday interaction). Same with the example in the Interpretable ML book - it just shows us posterior samples of hypothetical individuals, but we don't know the properties of the individuals - which, in my mind, is what one often wants to know.

I will have a go at using ArviZ and InferenceData and post here.

@aloctavodia
Copy link
Member

In arviz if you use var_names or filter_vars you are just selecting subsets of variables. But still, the results are dependent on all variables. So for a model like y ~ a+b+c and you select a you are just omitting b and c.
For PDP we are plotting y vs a by averaging the effects of all the rest of the variables, in this case b and c. ICE is similar but we keep individual observations.

I think what you want is to fit y ~ a+b+c, but then approximate y* ~ a+b i.e. as if we have never used c as part of the model. For that, one possible approximation is to prune the trees and remove those branches including the variable c. We currently do this to estimate variable importance. But maybe we can extend plot_dependence to exclude variables. Although this will need some empirical testing before using for some real example to test if we get reasonable results.

@ozika
Copy link
Author

ozika commented Jan 23, 2023

I think variable selection would be useful to determine the predictive power of a variable as a whole, what I am after is examining how do predictors influence outcomes. In this example it would be y* ~ a+b+(c==0) vs y* ~ a+b+(c==1).

I am realizing that the 'plot_posterior' example was wrong, it does only filter variables, it's been a while since I used pymc. I think a better example would be this.

@aloctavodia
Copy link
Member

These are a few approaches to estimating how do predictors influence outcomes using PyMC-BART

  • Fitting only one model:

    • PDP/ICE. PDP gives reasonable results when there is little interaction between variables. Why do you consider this not enough for your problem?
    • PDP/ICE, but pruning the trees to approximate excluding variables. This is something that could work in practice but it needs empirical validation. I would not use it for a real problem at this point.
  • Fitting more than one model

    • Estimate variable importance. Select the model with the smallest number of variables giving the closest R2 to the reference model. Use PDP/ICE on the reference model and on the smaller model (or the smallest models, if you want to try more than one). This is similar to the previous point but more conservative, as we only use the pruning to discard variables with little contribution and then we explore the rest of the variables in more detail by refitting the model.

Another potential approach could be to do something like https://github.com/yannmclatchie/kulprit, but we don't have a theory for BART for that (variable importance is only loosely inspired on that)

@ozika
Copy link
Author

ozika commented Jan 23, 2023

Thank you for your response (and patience!).

PDP/ICE. PDP gives reasonable results when there is little interaction between variables. Why do you consider this not enough for your problem?

Correct me if I am not getting it right - I think it's because (at least in my field) one often wants to understand the interactions, not select the variables. Following from the example on the tutorial, I plotted it using the ICE method.

Screenshot from 2023-01-23 14-43-39

Focusing on humidity, one can see that there is some variability (tree paths), however it's not clear whether this is caused by influence of hour, temprature or workingday - or is this a step that I am getting wrong?


Thank you for pointing out the kulprit package! I have actually been looking for exactly that in Python for a while :)

@aloctavodia
Copy link
Member

Focusing on humidity we can see that the pattern is essentially the same for all instances, it's just shifted up or down from the mean. This shows that there are no interactions (or at least we can not detect them). In other words, no matter at which values we fix the rest of the variables the effect of humidity on the rental of bikes seems to be the same. Flat (or slightly negative slope) at the begging followed by a slightly steeper (and negative) slope for a humidity higher than ~0.6.

Understanding interactions is very relevant for us too. We have some ideas for making it more straightforward for users to do that, but unfortunately, we are still in the early development stage and we also need to test those ideas. Let me know if you are interested in testing those ideas on your own datasets and I will contact you when we have something ready.

@ozika
Copy link
Author

ozika commented Jan 24, 2023

Thanks!

Sure, I'd definitely be happy to try things on some of my datasets :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants