Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rope to forestplot.py in forestplot() #448

Merged
merged 20 commits into from
Jan 15, 2019

Conversation

GWeindel
Copy link
Contributor

The new argument rope adds a shaded region of practical equivalence for all displayed variables in forestplot, shade is controlled by rope_alpha (default =.5). Contrary to posteriorplot rope can only be a tuple of length 2.
Not big but it is highly useful for me hence I assume it should be useful for others too.

@ahartikainen
Copy link
Contributor

Sounds good. Can you post example?

@GWeindel
Copy link
Contributor Author

@ahartikainen Sure, by the way it allowed me to spot an error in the ymax definition.
I post a reproducible example and link a plot I use.

Reproducible example :

import pystan
import arviz as az

data = dict(N1=10, y1=2, N2=10, y2=6)

model = """//  Comparison of two groups with Binomial
data {
  int<lower=0> N1;
  int<lower=0> y1;
  int<lower=0> N2;
  int<lower=0> y2;
}
parameters {
  real<lower=0,upper=1> theta1;
  real<lower=0,upper=1> theta2;
}
model {
  theta1 ~ beta(1,1);
  theta2 ~ beta(1,1);
  y1 ~ binomial(N1,theta1);
  y2 ~ binomial(N2,theta2);
}
generated quantities {
  real oddsratio;
  oddsratio = (theta2/(1-theta2))/(theta1/(1-theta1));
}
"""

sm = pystan.StanModel(model_code=model)
fit = sm.sampling(data=data, iter=1000, chains=2)

az.plot_forest(fit, var_names=["theta1","theta2"], credible_interval=0.95, combined=True, rope=(-.1,.1),
               figsize=(7,7))

rope_example

@aloctavodia
Copy link
Contributor

I like the idea, but it seems to me that the user should be allowed to set a rope per variable.

@canyon289
Copy link
Member

canyon289 commented Dec 11, 2018 via email

@GWeindel
Copy link
Contributor Author

Sure I get it, I have no use for it but some might and consistency is defintely important. I will adapt the rope code of posteriorplot() to forestplot() when I have a bit time and perhaps generalize it to the ridgeplot option. Thanks for this highly useful package by the way.

@aloctavodia
Copy link
Contributor

Great! And thank you for this contribution.

GWeindel added a commit to GWeindel/arviz that referenced this pull request Dec 17, 2018
Initial commit arviz-devs#448 completely rewritten.
Rope can be a list of 2 or a dict like in posteriorplot() passed to the rope argument
Argument rope_values writes ROPE values in plot when multiple ropes are given.
@GWeindel
Copy link
Contributor Author

I severely rewrote my initial commit. Now rope has the same behavior as posteriorplot. I provide a reproductible example and a plot I could use.
example_az

Reproductible example :

import pystan
import arviz as az
import stan_utility
import matplotlib.pyplot as plt

data = dict(N1=10, y1=2, N2=10, y2=6)

model = """//  Comparison of two groups with Binomial
data {
  int<lower=0> N1;
  int<lower=0> y1;
  int<lower=0> N2;
  int<lower=0> y2;
}
parameters {
  real<lower=0,upper=1> theta1;
  real<lower=0,upper=1> theta2;
}
model {
  theta1 ~ beta(1,1);
  theta2 ~ beta(1,1);
  y1 ~ binomial(N1,theta1);
  y2 ~ binomial(N2,theta2);
}
generated quantities {
  real oddsratio;
  oddsratio = (theta2/(1-theta2))/(theta1/(1-theta1));
}
"""

sm = pystan.StanModel(model_code=model)
fit = sm.sampling(data=data, iter=1000, chains=2)

rope = {'theta1' : [{'rope': (-.1, .1)}],
 'theta2' : [{'rope': (.2, .5)}],
 'oddsratio' : [{'rope': (10, 20)}]}

ropeList = (-.1, .1)

az.plot_forest(fit, var_names=["theta1","theta2"],credible_interval=0.95, combined=True, rope=rope, rope_values=True)
plt.show()
az.plot_forest(fit, var_names=["theta1","theta2"], credible_interval=0.95, combined=True, rope=ropeList,  figsize=(7,7))
plt.show()

@GWeindel
Copy link
Contributor Author

However, I don't master classes so I might have introduced a lot of redundancy...

@GWeindel
Copy link
Contributor Author

Faster reproducible and more complete example taken from the gallery :

size = (1, 50)

data = {'normal': np.random.randn(*size),
    'gumbel': np.random.gumbel(size=size),
    'student t': np.random.standard_t(df=6, size=size),
    'exponential': np.random.exponential(size=size)}

rope = {'normal': [{"rope":(-.5,.5)}],
        'gumbel': [{"rope":(0,.5)}],
        'student t': [{"rope":(-.5,1)}],
        'exponential': [{"rope":(-.5,1)}]
       }

az.plot_forest(data, rope=(-1,1))
plt.show()
az.plot_forest(data, rope=rope)
plt.show()
az.plot_forest(data, rope=rope, rope_values=False)
plt.show()

@aloctavodia
Copy link
Contributor

I like it. I think the rope should not be labeled here. I know it is for the posterior plot, by for that plot the hpd is also labeled.

@GWeindel
Copy link
Contributor Author

You're right, I was hesitating whether to keep it.

Copy link
Member

@canyon289 canyon289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running `./scripts/lint.sh/ I'm getting the following error

  • SRC_DIR=/home/tbayes/repos/arviz
  • echo 'Checking documentation...'
    Checking documentation...
  • python -m pydocstyle --convention=numpy /home/tbayes/repos/arviz/arviz/
    /home/tbayes/repos/arviz/arviz/plots/forestplot.py:246 in public method display_multiple_ropes:
    D102: Missing docstring in public method

arviz/plots/forestplot.py Outdated Show resolved Hide resolved
arviz/plots/forestplot.py Show resolved Hide resolved
arviz/plots/forestplot.py Outdated Show resolved Hide resolved
arviz/plots/forestplot.py Outdated Show resolved Hide resolved
arviz/plots/forestplot.py Show resolved Hide resolved
@canyon289
Copy link
Member

@GWeindel Thank you for the quick edits. I'll do another round of review either today or tomorrow.

@canyon289
Copy link
Member

@GWeindel Thanks for the fixes. Unfortunately our CI checks are broken at this commit so I'm running things manually as you make changes. Here's the errors I'm getting.

If you'd like you can rebase your changes on top of the current arviz master to get CI checks to work again. If you need help with this let me know.

Alternatively you can run

./scripts/lint.sh

and it will generate the output below

(arviz) tbayes@Users-MBP-2:~/repos/arviz$ ./scripts/lint.sh
++ pwd

  • SRC_DIR=/Users/tbayes/repos/arviz
  • echo 'Checking documentation...'
    Checking documentation...
  • python -m pydocstyle --convention=numpy /Users/tbayes/repos/arviz/arviz/
    /Users/tbayes/repos/arviz/arviz/plots/forestplot.py:246 in public method display_multiple_ropes:
    D210: No whitespaces allowed surrounding docstring text
    /Users/tbayes/repos/arviz/arviz/plots/forestplot.py:246 in public method display_multiple_ropes:
    D400: First line should end with a period (not 'd')
    /Users/tbayes/repos/arviz/arviz/plots/forestplot.py:246 in public method display_multiple_ropes:
    D401: First line should be in imperative mood ('Display', not 'Displays')

@ahartikainen
Copy link
Contributor

LGTM.

I fixed some annoying pylint errors.

@canyon289
Copy link
Member

canyon289 commented Dec 31, 2018

Does not LGTM yet,
Need test for new multi rope functionality. I can add one later today

https://coveralls.io/builds/20848553/source?filename=arviz%2Fplots%2Fforestplot.py#L257
https://coveralls.io/builds/20848553/source?filename=arviz%2Fplots%2Fforestplot.py#L352

@canyon289
Copy link
Member

Give me a couple more days, got caught up with other tasks

@canyon289 canyon289 changed the title Added rope to forestplot.py in forestplot() [WIP] Added rope to forestplot.py in forestplot() Jan 4, 2019
@canyon289
Copy link
Member

canyon289 commented Jan 4, 2019

Trying to add testing but messing up on arguments somewhere. The following code generates an exception

import arviz as az
import matplotlib.pyplot as plt

az.style.use('arviz-darkgrid')

centered_data = az.load_arviz_data('centered_eight')

rope = {'mu': [{'rope': (-.1, .1)}], 'tau': [{'rope': (.2, .5)}]}
_, axes = az.plot_forest(centered_data,
                         var_names=['mu', 'tau'],
                         rope=rope)
axes[0].set_title('Estimated theta for eight schools model')
plt.show()

will continue working on this

@canyon289
Copy link
Member

canyon289 commented Jan 5, 2019

This implementation seems to break when each variable has multiple chains

Combined = True works

import arviz as az
import matplotlib.pyplot as plt
az.style.use('arviz-darkgrid')
centered_data = az.load_arviz_data('centered_eight')
rope = {'mu': [{'rope': (-.1, .1)}], 'tau': [{'rope': (.2, .5)}]}
_, axes = az.plot_forest(centered_data,
                         var_names=['mu', 'tau'],
                         combined=True,
                         rope=rope)
axes[0].set_title('Estimated theta for eight schools model')
plt.show()

Combined = False fails

import arviz as az
import matplotlib.pyplot as plt
az.style.use('arviz-darkgrid')
centered_data = az.load_arviz_data('centered_eight')
rope = {'mu': [{'rope': (-.1, .1)}], 'tau': [{'rope': (.2, .5)}]}
_, axes = az.plot_forest(centered_data,
                         var_names=['mu', 'tau'],
                         combined=False,
                         rope=rope)
axes[0].set_title('Estimated theta for eight schools model')
plt.show()

I have a couple suggested paths but happy to hear any others

  1. Figure out how to make solution work with multiple chains
  2. Provide a warning that per var rope won't work if multiple chains

I won't have time to code solution 1 if we choose to go that route, but can do solution 2.
Depending on which version we go for I can write the appropriate tests

@GWeindel
Copy link
Contributor Author

GWeindel commented Jan 8, 2019

I won't have the time (and perhaps sufficient skills) for solution 1. Hence I would suggest providing a warning.

@ahartikainen
Copy link
Contributor

ahartikainen commented Jan 8, 2019

So what fails is

... label[ticks == y][0]

Which would mean that we don't have label for all y. Maybe test if y is found?

edit.

Line 331

self.display_multiple_ropes(rope, ax, y, linewidth, label[ticks == y][0])

to

if (ticks == y).any():
    self.display_multiple_ropes(rope, ax, y, linewidth, label[ticks == y][0]) 

@canyon289
Copy link
Member

What happens is that ArviZ is trying to match the label with the line. When the chains are compressed label position matches the y position in the numpy array so one value is found, we get it out of the array in index zero.

In multi chain the ticks, and the y position, are off every so slightly, .175 if I remember correctly. The equality operator doesn't match anything and the array is empty, hence the index error.

I hope this not so great description yields some insight. I can provide more detail if needed

@ahartikainen
Copy link
Contributor

Ok

We need y_group variable which is the mean of y locations per parameter (or value to be plotted). VarHandler probably needs to calculate these before hand, because all the yielding is disabling to do that later?

ahartikainen pushed a commit to GWeindel/arviz that referenced this pull request Jan 14, 2019
Initial commit arviz-devs#448 completely rewritten.
Rope can be a list of 2 or a dict like in posteriorplot() passed to the rope argument
Argument rope_values writes ROPE values in plot when multiple ropes are given.
@ahartikainen
Copy link
Contributor

Combined True

image

Combined False

image

@canyon289
Copy link
Member

Now we just need to write a couple more tests and we're good!

The new argument rope adds a shaded region of practical equivalence for all displayed variables in forestplot, shade is controlled by rope_alpha (default =.5). Contrary to posteriorplot rope can only be a tuple of length 2.
Not big but it is highly useful for me hence I assume it should be useful for others too.
GWeindel and others added 19 commits January 14, 2019 21:51
taking len(values) instead of values[-1] for ymax argument in axvspan()
Initial commit arviz-devs#448 completely rewritten.
Rope can be a list of 2 or a dict like in posteriorplot() passed to the rope argument
Argument rope_values writes ROPE values in plot when multiple ropes are given.
Fixed bug in ticks and title when specifying mutliple ROPES
- Removed unused variables
- Moved the label/ticks collection to the case where a dict > 2 is provided
- Added doc string to display_multiple_ropes()
- Removed unused Variable xt_labelsize from ridgeplot() arguments
@canyon289 canyon289 dismissed their stale review January 15, 2019 05:53

Additional changes made afterwards

@canyon289 canyon289 changed the title [WIP] Added rope to forestplot.py in forestplot() Added rope to forestplot.py in forestplot() Jan 15, 2019
@canyon289 canyon289 merged commit 836d9b9 into arviz-devs:master Jan 15, 2019
@canyon289
Copy link
Member

Thanks for the contribution @GWeindel

@GWeindel
Copy link
Contributor Author

Great ! Thanks for the help with the multilple chains and the tests !

@GWeindel GWeindel deleted the patch-1 branch January 15, 2019 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants