Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipy.stats.gaussian_kde contour plots #50

Merged
merged 17 commits into from
Aug 13, 2019
Merged

Conversation

williamjameshandley
Copy link
Collaborator

@williamjameshandley williamjameshandley commented Aug 10, 2019

Description

  • This is a relatively large addition, offering an alternative kde to fastkde using scipy.stats.gaussian_kde
  • fastkde is therefore now an optional requirement, with CI updated accordingly).
  • It is a necessary step in order to implement boundary corrections Boundary corrections #35

The main innovation here is to extensively use matplotlib's triangulation functionality to first dynamically bin, with bins defined by a random subsample. This means that you put bins where they are needed, and typically only 1000 points are required. Triangulation is also used to plot the contours, so once again only 1000 samples are required. This means that despite the fact that gaussian_kde is slower than fastkde, it is still snappy enough to be used dynamically for large plots.

The final result is reasonable:

Figure_1

Although it clearly needs the boundary data correction to remove the edge effects.

One additional change that is mixed in here is to revert ax.scatter to ax.plot in scatter_plot_2d, since I have observed scatter to make axis limits misbehave in non-trivial ways when applied to practical examples where data have very different scales. This has been documented and will be fixed in matplotlib v3.2, but until that is merged, we should revert to ax.plot. Whilst ax.scatter is more consistent with the name of the function, moving away from that actually neatens up a lot of the colouring issues that we have seen before (#32 #21 #45 #19 #31)

Fixes #4
Replaces #25

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works

@williamjameshandley williamjameshandley self-assigned this Aug 10, 2019
@williamjameshandley williamjameshandley added this to the 1.2 milestone Aug 10, 2019
@williamjameshandley williamjameshandley added the enhancement New feature or request label Aug 10, 2019
@codecov
Copy link

codecov bot commented Aug 10, 2019

Codecov Report

Merging #50 into master will decrease coverage by 6.06%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #50      +/-   ##
=========================================
- Coverage   92.67%   86.6%   -6.07%     
=========================================
  Files          14      14              
  Lines        1078    1187     +109     
=========================================
+ Hits          999    1028      +29     
- Misses         79     159      +80
Impacted Files Coverage Δ
anesthetic/fastkde.py 100% <100%> (ø)
anesthetic/samples.py 94.7% <77.77%> (-2.62%) ⬇️
anesthetic/plot.py 86.07% <86.15%> (-5.5%) ⬇️
anesthetic/utils.py 93.52% <90.38%> (-1.39%) ⬇️
anesthetic/read/montepythonreader.py 38.15% <0%> (-61.85%) ⬇️
anesthetic/read/samplereader.py 75% <0%> (-4.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd7bcec...ad70036. Read the comment docs.

@handley-lab handley-lab deleted a comment from codecov bot Aug 10, 2019
@codecov
Copy link

codecov bot commented Aug 10, 2019

Codecov Report

Merging #50 into master will increase coverage by 0.05%.
The diff coverage is 94.36%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #50      +/-   ##
=========================================
+ Coverage   92.65%   92.7%   +0.05%     
=========================================
  Files          14      14              
  Lines        1075    1192     +117     
=========================================
+ Hits          996    1105     +109     
- Misses         79      87       +8
Impacted Files Coverage Δ
anesthetic/samples.py 97.39% <100%> (+0.06%) ⬆️
anesthetic/kde.py 100% <100%> (ø) ⬆️
anesthetic/utils.py 93.71% <91.22%> (-1.21%) ⬇️
anesthetic/plot.py 92.01% <95.38%> (+0.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64a4bd8...9493acb. Read the comment docs.

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming kde.py to fastkde.py causes an import error even when fastkde is installed because of overloading the term fastkde.

@williamjameshandley
Copy link
Collaborator Author

Renaming kde.py to fastkde.py causes an import error even when fastkde is installed because of overloading the term fastkde.

Well spotted. I think I should have fixed this now. There weren't actually that many places that anesthetic.fastkde was being imported, so it's a relatively easy change.

lukashergt
lukashergt previously approved these changes Aug 11, 2019
anesthetic/plot.py Outdated Show resolved Hide resolved
if sum(w != 0) < n:
i = numpy.arange(len(w))
else:
i = numpy.random.choice(len(w), size=n, replace=False, p=w/w.sum())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should replace really be False here?
How does that affect the weighting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question. It all depends on what you do with the weights after. For concreteness, consider this code:

x = numpy.array([1,2,3])
w = numpy.array([1e10,1,1])
print(numpy.random.choice(x, size=2, replace=False, p = w/w.sum()))

You are right that replace=False would definitely be wrong if you were using it to 'de-weight' a set of samples (as in the MCMCSamples.compress) function). However, if you keep the weights, then this acts as an alternative form of compression . You can actually think of the strategy performed in MCMCSamples.compress as doing this, but with the optimal size chosen by the channel capacity.

In actual fact, since we are only using this to bin the weights into ~O(1000) samples, it doesn't even matter that i is chosen as a compression strategy. For example, if you wanted to ensure better coverage in the tails, the p in the above line need not be proportional to the weights. Points chosen in the tails would aggregate to have very low weight, but that may be what you want in practice if you wanted to go beyond two-sigma contours.

anesthetic/utils.py Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
anesthetic/plot.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tricontour plots
2 participants