-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scipy.stats.gaussian_kde contour plots #50
Conversation
Codecov Report
@@ Coverage Diff @@
## master #50 +/- ##
=========================================
- Coverage 92.67% 86.6% -6.07%
=========================================
Files 14 14
Lines 1078 1187 +109
=========================================
+ Hits 999 1028 +29
- Misses 79 159 +80
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #50 +/- ##
=========================================
+ Coverage 92.65% 92.7% +0.05%
=========================================
Files 14 14
Lines 1075 1192 +117
=========================================
+ Hits 996 1105 +109
- Misses 79 87 +8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming kde.py
to fastkde.py
causes an import error even when fastkde is installed because of overloading the term fastkde.
Well spotted. I think I should have fixed this now. There weren't actually that many places that anesthetic.fastkde was being imported, so it's a relatively easy change. |
if sum(w != 0) < n: | ||
i = numpy.arange(len(w)) | ||
else: | ||
i = numpy.random.choice(len(w), size=n, replace=False, p=w/w.sum()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should replace
really be False
here?
How does that affect the weighting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question. It all depends on what you do with the weights after. For concreteness, consider this code:
x = numpy.array([1,2,3])
w = numpy.array([1e10,1,1])
print(numpy.random.choice(x, size=2, replace=False, p = w/w.sum()))
You are right that replace=False
would definitely be wrong if you were using it to 'de-weight' a set of samples (as in the MCMCSamples.compress
) function). However, if you keep the weights, then this acts as an alternative form of compression . You can actually think of the strategy performed in MCMCSamples.compress
as doing this, but with the optimal size
chosen by the channel capacity.
In actual fact, since we are only using this to bin the weights into ~O(1000) samples, it doesn't even matter that i is chosen as a compression strategy. For example, if you wanted to ensure better coverage in the tails, the p
in the above line need not be proportional to the weights. Points chosen in the tails would aggregate to have very low weight, but that may be what you want in practice if you wanted to go beyond two-sigma contours.
Description
fastkde
usingscipy.stats.gaussian_kde
fastkde
is therefore now an optional requirement, with CI updated accordingly).The main innovation here is to extensively use matplotlib's triangulation functionality to first dynamically bin, with bins defined by a random subsample. This means that you put bins where they are needed, and typically only 1000 points are required. Triangulation is also used to plot the contours, so once again only 1000 samples are required. This means that despite the fact that
gaussian_kde
is slower thanfastkde
, it is still snappy enough to be used dynamically for large plots.The final result is reasonable:
Although it clearly needs the boundary data correction to remove the edge effects.
One additional change that is mixed in here is to revert
ax.scatter
toax.plot
inscatter_plot_2d
, since I have observed scatter to make axis limits misbehave in non-trivial ways when applied to practical examples where data have very different scales. This has been documented and will be fixed in matplotlib v3.2, but until that is merged, we should revert toax.plot
. Whilstax.scatter
is more consistent with the name of the function, moving away from that actually neatens up a lot of the colouring issues that we have seen before (#32 #21 #45 #19 #31)Fixes #4
Replaces #25
Checklist:
flake8 anesthetic tests
)pydocstyle --convention=numpy anesthetic
)python -m pytest
)