Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Posterior plot errors with boolean array. #1707

Merged
merged 11 commits into from
Mar 23, 2022

Conversation

utkarsh-maheshwari
Copy link
Contributor

@utkarsh-maheshwari utkarsh-maheshwari commented May 27, 2021

Description

Plotted bar instead of hist for a boolean value.
Also, should HDI also be visible? I have assumed no.

Fixes #1694

Checklist

  • Follows official PR format
  • Includes a sample plot to visually illustrate the changes (only for plot-related functions)
  • New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use if dtype kind == "f" -> elif dtype kind == "i" else (dtype == "b") instead of converting to int and storing is_bool. We may also want to update a bit more the apperance of the plot? @sethaxen? i.e. showing the y axis in the boolean case, or hiding the mean and annotating both bars with count (xx%) on top of them, things like that

@utkarsh-maheshwari
Copy link
Contributor Author

Yes, it looks incomplete without the y axis. Don't we need it in the case of integers?

Achieved this.
image

through this:

import numpy as np
import arviz as az

data = np.random.choice(a=[False, True], size=(4, 100))
az.plot_posterior(data)

@utkarsh-maheshwari
Copy link
Contributor Author

@OriolAbril Can the dtype of values be other than f, i or b?

@sethaxen
Copy link
Member

@OriolAbril Can the dtype of values be other than f, i or b?

That's a good question. In principle a PPL could produces samples of strings or objects, though all I am aware of currently would require first assigning integers to the strings/objects and then produce integer samplers. Do we anywhere in arviz assert that the dtype is amount these types?

We may also want to update a bit more the apperance of the plot? @sethaxen? i.e. showing the y axis in the boolean case, or hiding the mean and annotating both bars with count (xx%) on top of them, things like that

Yes, I think it makes sense to label the bars with count and percent instead mean, and if this is done, an axis is not necessary.
I like the gap between the bins.

@ahartikainen
Copy link
Contributor

True/False is an edge case from category dtype with 2 categories.

If we have a flag for category, then we could do this kind of (bar) plot for multiple categories too.

@utkarsh-maheshwari
Copy link
Contributor Author

If we have a flag for category, then we could do this kind of (bar) plot for multiple categories too.

So, does that mean
if dtype == " f " -> kde
elif dtype == " i " -> hist
else -> bar

@utkarsh-maheshwari utkarsh-maheshwari changed the title Fixed Posterior plot errors with boolean array. Addresses issue #1694 Fixed Posterior plot errors with boolean array. May 29, 2021
@OriolAbril
Copy link
Member

So, does that mean
if dtype == " f " -> kde
elif dtype == " i " -> hist
else -> bar

Sorry about the delay, I think this is a good starting point, we can then improve/refactor the else case to handle the category case too, but before that we have to decide on an api, have this info in inferencedata or as an argument or both which will probably take some time.

@codecov
Copy link

codecov bot commented Jun 2, 2021

Codecov Report

Merging #1707 (f467530) into main (479f2a7) will decrease coverage by 11.25%.
The diff coverage is 100.00%.

❗ Current head f467530 differs from pull request most recent head b799812. Consider uploading reports for the commit b799812 to get more accurate results

@@             Coverage Diff             @@
##             main    #1707       +/-   ##
===========================================
- Coverage   90.81%   79.55%   -11.26%     
===========================================
  Files         114      114               
  Lines       12332    12348       +16     
===========================================
- Hits        11199     9824     -1375     
- Misses       1133     2524     +1391     
Impacted Files Coverage Δ
arviz/plots/backends/bokeh/posteriorplot.py 98.34% <100.00%> (+0.14%) ⬆️
arviz/plots/backends/matplotlib/posteriorplot.py 98.36% <100.00%> (+0.08%) ⬆️
arviz/data/io_emcee.py 12.39% <0.00%> (-85.96%) ⬇️
arviz/data/io_pystan.py 14.12% <0.00%> (-81.98%) ⬇️
arviz/data/io_cmdstanpy.py 15.62% <0.00%> (-78.13%) ⬇️
arviz/data/io_pymc3_3x.py 18.18% <0.00%> (-72.06%) ⬇️
arviz/data/io_pyro.py 27.81% <0.00%> (-68.43%) ⬇️
arviz/data/io_numpyro.py 26.71% <0.00%> (-67.18%) ⬇️
arviz/data/io_cmdstan.py 48.48% <0.00%> (-41.13%) ⬇️
arviz/data/io_pymc3.py 66.66% <0.00%> (-6.67%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 479f2a7...b799812. Read the comment docs.

@utkarsh-maheshwari
Copy link
Contributor Author

Added lines #L332 - L336 were not covered by tests
How to resolve this error?

@ahartikainen
Copy link
Contributor

We need to add test that has boolean data. (check current tests and see if you can add boolean data as input for the test for this plot)

@utkarsh-maheshwari
Copy link
Contributor Author

I think I need to create a function test_plot_posterior_boolean below

def test_plot_posterior(models, kwargs):
.
to test with a boolean data. Right?

@ahartikainen
Copy link
Contributor

yes

@utkarsh-maheshwari
Copy link
Contributor Author

I think now there are some errors not related to my code changes.

@@ -319,18 +319,23 @@ def format_axes():
rug=False,
show=False,
)
else:
elif values.dtype.kind == "i":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial condition is if kind=kde and dtype=f so this condition is too restrictive, I think it should be if dtype=i or (dtype=f and kind=hist)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea right. I should have done it that way. Thanks for pointing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test exists because we all do this which is also why writing tests is important everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
One more observation. Now, that there is an added kind of plot bar for boolean values. So maybe we should set
kind = 'bar' somewhere and should accept kind = 'bar'.

@utkarsh-maheshwari
Copy link
Contributor Author

utkarsh-maheshwari commented Jun 12, 2021

I have no idea why tests are failing. It works fine when I write the same test in notebook.
Is there a better way to debug tests?

@OriolAbril
Copy link
Member

Is there a better way to debug tests?

Were tests passing locally? Whenever I get stuck on one or a couple tests I run them locally, and only those so I can iterate fast. pytest has a lot of features to select which subset of tests to run: https://docs.pytest.org/en/6.2.x/reference.html#command-line-flags. To begin with you have to call pytest dir_or_file where dir_or_file can be a path to a folder, a path to a file or a glob pattern, you can also use special syntax after that with colons to select which class/test function to run but I find that very complicated and always use the -k flag to filter tests that fulfill a condition, in your case it could be something like -k "posterior and boolean" to run only the specific test you added

def test_plot_posterior_boolean():
data = np.random.choice(a=[False, True], size=(4, 100))
axes = plot_posterior(data)
plt.draw()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plt.draw()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not needed for tests (or is it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer to this question may look weird at first but
On notebook, assert axes.get_xticklabels() runs fine when run in next cell but returns empty list when run on the same cell containing plt.plot().
Found a solution here

(My earlier comment got duplicated. When I deleted one, both got deleted 🤦🏼 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then this can be removed (we don't run tests in notebooks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests were failing when I removed this because then get_xticklabels() returns empty list.

Interactive users when want to run axes.get_xticklabels() can do this by running it in different cell

But non interactive users have to call plt.draw() before using axes.get_xticklabels()

More clear explanation :

in interactive mode in ipython, a draw() is automatically triggered after each function that has a plotting action. In non-interactive environments, the draw action is executed only at the end--this saves a lot of run time.

But maybe, we can call plt.draw() inside our posterior plot function and remove it from here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tests are really run from command line

python -m pytest path/to/tests

So keeping them clean is ok.

@utkarsh-maheshwari
Copy link
Contributor Author

@OriolAbril This is much better approach. Thanks!

@utkarsh-maheshwari
Copy link
Contributor Author

Unable to understand why a codecov shows a lot of code becomes uncovered through these changes. It shows whole matplotlib posterior plot function is becomes uncovered!

@OriolAbril
Copy link
Member

Unable to understand why a codecov shows a lot of code becomes uncovered through these changes. It shows whole matplotlib posterior plot function is becomes uncovered!

If the azure job fails, nothing is uploaded to codecov, thus if all jobs fail, the coverage is 0

data = np.random.choice(a=[False, True], size=(4, 100))
axes = plot_posterior(data)
assert axes
assert axes.get_xticklabels()[0].get_text() != 0
Copy link
Contributor Author

@utkarsh-maheshwari utkarsh-maheshwari Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not the best way to see if xticklabels are correct. It should be like assert axes.get_xticklabels()[0].get_text() == 'False'.
When I tried it locally, tests work well only if I provide show=True to the plot function call but shows an assertion error when show=False (default).
AssertionError: assert ' ' == 'False'
Azure jobs fails with same error even if show=True is pushed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it could be solved using some non-interactive/directly writing to file backend? I don't reallly know what is happening, but it would not be surprising that Azure had some extra checks preventing guis, pop ups or other interactive objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already tried using plt.draw() in the matplotlib backend file, didn't work! I guess we need to call plt.draw() in the test function before assert or maybe leave it like that if its an "okayish" way to test if not good.

@ahartikainen ahartikainen merged commit aeb25cb into arviz-devs:main Mar 23, 2022
@ahartikainen
Copy link
Contributor

Thanks for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Posterior plot errors with boolean array
5 participants