Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve issue #992 : Integrate point_estimate with rcParam #994

Merged
merged 15 commits into from
Jan 17, 2020

Conversation

percygautam
Copy link
Contributor

@percygautam percygautam commented Jan 13, 2020

Fixes #992 . The following changes are made:

  • Added "auto" as default-type for point_estimate param for densityplot.py and posteriorplot.py

  • Added the docstring for changes.

  • Passes the pytest.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I have just realized that the "mode" option is not yet available in plot_density (see its matplotlib file) it is only available in plot_posterior (see its matplotlib file).

If you feel up for the challenge I can guide you to add the "mode" option to plot_density, otherwise we can merge this and file a new issue.

Plot point estimate per variable. Values should be 'mean', 'median' or None.
Defaults to 'mean'.
Plot point estimate per variable. Values should be 'mean', 'median', 'mode' or None.
Defaults to 'auto'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explain that auto falls back to the default set in rcParams, otherwise it may be confusing to new users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. I'll add it right away.

@@ -59,7 +60,7 @@ def plot_posterior(
round_to : int, optional
Controls formatting of floats. Defaults to 2 or the integer part, whichever is bigger.
point_estimate: str
Must be in ('mode', 'mean', 'median', None)
Must be in ('mode', 'mean', 'median', None). Defaults to 'auto'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -657,7 +657,7 @@ def test_plot_rank(models, kwargs):
{"rope": {"mu": [{"rope": (-2, 2)}], "theta": [{"school": "Choate", "rope": (2, 4)}]}},
{"point_estimate": "mode"},
{"point_estimate": "median"},
{"point_estimate": False},
{"point_estimate": "mean"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to keep this case testing the no point estimate behaviour, which is achieved with None. "mean" is the default in rcParams, so it is actually tested many times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes None makes more sense as rest three are tested many times. Previously, the tests are failing in pytest. I was getting the following ValueError:
ValueError: Point estimate should be 'mean', 'median', 'mode' or None, not False
Can you explain this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it None.

@@ -883,7 +883,7 @@ def test_plot_ppc_ax(models, kind):
{"rope": {"mu": [{"rope": (-2, 2)}], "theta": [{"school": "Choate", "rope": (2, 4)}]}},
{"point_estimate": "mode"},
{"point_estimate": "median"},
{"point_estimate": False},
{"point_estimate": "mean"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment, None would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@percygautam
Copy link
Contributor Author

percygautam commented Jan 13, 2020

@OriolAbril I read the matplotlib files of plot_density and plot_posterior. I would like to add mode option to plot_density in this PR itself. We already got so many issues, I'd hate it to make another one if it can be solved here. Please guide me through it!

@OriolAbril
Copy link
Member

Great! The currently currently has some duplication when calculating the point estimates. The skeleton of the code is basically some ifs clauses and inside the calculation of the point estimate using a vec/values variable (that is a 1d array).

It would be great to create a function in plot_utils similar to this:

def calculate_point_estimate(point_estimate, values, bw):
    # point_estimate validation and raise error here
    # if mean
        calculations
    # if median
        ...
    return point_value

The code inside the function would be more or less the same as the current code in posteriorplot. Then, when drawing the point estimate in plot_posterior and plot_density (both bokeh and matplotlib, so 4 files) the code would be simplified to calling this function (instead of the ifs) and then plotting (which I think won't need to be modified).

Also, both the check for good values plus error raising and getting the default from rcParams can also be moved to this new internal function.

@percygautam
Copy link
Contributor Author

@OriolAbril I have done the changes you asked. Some tests were failing. Kindly review the work. I'll do black formatting as soon as tests passes

@ahartikainen
Copy link
Contributor

Can you move _fast_kde functionality to plot_utils. There is an error in import order.

@percygautam
Copy link
Contributor Author

@ahartikainen Okay I'll do that but a fourier transform method in plot utilities doesn't make sense. Isn't there any workaround to the problem.

@ahartikainen
Copy link
Contributor

I think it can go there until we update our KDE code and move it to another location (outside plotting).

@percygautam
Copy link
Contributor Author

@ahartikainen @OriolAbril , I have done the changes asked.

@percygautam
Copy link
Contributor Author

@OriolAbril @ahartikainen The _fast_kde function is shoeing to be protected and can't be used outside directory plots.

@@ -12,9 +12,8 @@
from scipy.optimize import minimize
import xarray as xr

from ..plots.plot_utils import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the import explicit from ..plots.plot_utils import _fast_kde, get_bins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OriolAbril I already tried that, it is showing import error and the tests are not running

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* will surely not import _fast_kde because * only imports public functions. Using explicit import should work though, just like it worked with from ..plots.kdeplot import _fast_kde which is also on another folder). Could there have been a typo? I currently cannot test it nor work on this locally, this is as much as I can do for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OriolAbril I'll explicit import the function, so you can review!

nitishp25 and others added 3 commits January 14, 2020 21:13
* correct bfmi denominator

* update test_diagnostics.py

* fixed denominator and docs
@percygautam
Copy link
Contributor Author

@OriolAbril @ahartikainen There was some the dependency issue which was causing the import error. All set now!

@percygautam
Copy link
Contributor Author

@OriolAbril @ahartikainen There were linting issues. Should be okay now.

@percygautam
Copy link
Contributor Author

@OriolAbril I am getting error with pylint No name 'gaussian' in module 'scipy.signal'.
I guess this is problem with pylint with gausian not registered.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tinkered a little with the code.

Regarding the gaussian issue, it clearly should not happen, as scipy.signal.gaussian is still valid, there have been some movements towards its deprecation though, so now looks like a good moment to make the move and follow the warning in the docstring:

.. warning:: scipy.signal.gaussian is deprecated,
                 use scipy.signal.windows.gaussian instead.

Regarding the issue with _fast_kde import, I have commented a possible solution. It seemed to work locally.

@@ -185,11 +189,8 @@ def _d_helper(
ax.diamond(xmin, 0, line_color="black", fill_color=color, size=markersize)
ax.diamond(xmax, 0, line_color="black", fill_color=color, size=markersize)

est = calculate_point_estimate(point_estimate, vec, bw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move that inside the if, there is no need to calculate the point value if it will not be plotted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Will do.

@@ -187,21 +187,9 @@ def display_rope(max_data):
ax.text(x=vals, y=[max_data * 0.2, max_data * 0.2], text=list(map(str, vals)), **text_props)

def display_point_estimate(max_data):
point_value = calculate_point_estimate(point_estimate, values, bw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment, here it means after the if instead of inside though

@@ -155,11 +160,8 @@ def _d_helper(
ax.plot(xmin, 0, hpd_markers, color=color, markeredgecolor="k", markersize=markersize)
ax.plot(xmax, 0, hpd_markers, color=color, markeredgecolor="k", markersize=markersize)

est = calculate_point_estimate(point_estimate, vec, bw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside the if

@@ -179,21 +179,9 @@ def display_rope():
ax.text(vals[1], plot_height * 0.2, vals[1], weight="semibold", **text_props)

def display_point_estimate():
point_value = calculate_point_estimate(point_estimate, values, bw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the if

from ..data import convert_to_inference_data, convert_to_dataset, InferenceData, CoordSpec, DimSpec
from ..plots.kdeplot import _fast_kde
from ..plots.plot_utils import get_bins
from .diagnostics import _multichain_statistics, _mc_error, ess, _circular_standard_deviation
from .stats_utils import (
make_ufunc as _make_ufunc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below on line 27, histogram is imported from stats_utils in a weird way, it can be moved to this import.

I think this triggers some kind of circular import and is the root of the import problems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OriolAbril I noticed this problem earlier too and has removed the import problem with b2f600d commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the code should use the histogram function, not rewrite it here from scratch.

@percygautam
Copy link
Contributor Author

@OriolAbril Where do we need to add warning in the docstring? I have added it in the function _fast_kde as it is using gaussian function. The other changes you asked are done.

@OriolAbril
Copy link
Member

The warning is in the docstring of scipy.signal.gaussian, I was thinking we could follow its advise and start using scipy.signal.windows.gaussian instead.

Also, now tests seem to pass and ArviZ can be imported, but we should move the from ..stats.stats_utils import histogram to the imports right above:

from .stats_utils import (
    make_ufunc as _make_ufunc,
    wrap_xarray_ufunc as _wrap_xarray_ufunc,
    logsumexp as _logsumexp,
    ELPDData,
    stats_variance_2d as svar,
    histogram  # here is its proper place
)

@percygautam
Copy link
Contributor Author

@OriolAbril Thanks for the clarification. I didn't understand the problem earlier. I did the changes asked.

@@ -9,7 +9,7 @@
from .forestplot import plot_forest
from .hpdplot import plot_hpd
from .jointplot import plot_joint
from .kdeplot import plot_kde, _fast_kde, _fast_kde_2d
from .kdeplot import plot_kde, _fast_kde_2d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also move _fast_kde_2d next to _fast_kde, sorry for not saying it explicitly before

@percygautam
Copy link
Contributor Author

@ahartikainen Should I do these changes in this PR itself or create another one after this is merged. Transferring _fast_kde_2d to plot_utils.py will again cause dependency issues as they did earlier with _fast_kde.

@OriolAbril
Copy link
Member

After modifying the import of histogram from stats_utils should be safe to move fast_kde_2d, we can do it here.

@percygautam
Copy link
Contributor Author

@OriolAbril I am moving all derived private functions _cov_1d, _cov and _dot in _fast_kde_2d to plot_utils.py, as it would make sense. Is it okay?

@OriolAbril
Copy link
Member

LGTM

@percygautam percygautam changed the title [WIP] Solve issue #992 : Integrate point_estimate with rcParam Solve issue #992 : Integrate point_estimate with rcParam Jan 17, 2020
Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahartikainen ahartikainen merged commit 4cfe800 into arviz-devs:master Jan 17, 2020
@percygautam percygautam deleted the point_estimate branch January 20, 2020 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

integrate point_estimate rcParam
4 participants