[query] add less biased unsmoothed pdf to ggplot #13608

patrick-schultz · 2023-09-12T12:33:27Z

… and make the default

Examples
Histogram without setting min/max. Requires two passes over data, has low resolution over interesting part of the distribution:

Histogram with manual min/max. Most accurate, but different choices of number of bins cause different artifacts. Requires knowledge of distribution beforehand.

Smoothed approx_cdf based pdf (old default). Smoothing causes large distortion.

New unsmoothed `approx_cdf_ based pdf. Single pass, works well for any distribution, needs no tuning or foreknowledge of distribution.

… and make the default

hail/python/hail/ggplot/geoms.py

ready for another look

ehigham

Thanks for making such an effort to make this more readable. I have a couple of small suggestions then we can merge this.

ehigham · 2023-09-18T17:09:21Z

hail/python/hail/expr/functions.py

+            p = update_grid_size(p)
+        return p
+
+    def compute_single_error(s, failure_prob=failure_prob):


Do you intend to pull these inner functions out or can we just get failure_prob from the enclosing scope (ie remove the parameter)?

The parameter is there because it gets called with different values. This function computes the error bound on an estimated rank for a single value, with a given probability of exceeding the bound. We sometimes want an error bound that applies to all possible values simultaneously, with a given probability of any rank estimate exceeding the bound. Computing that involves computing an error bound for a single value with an appropriately smaller failure probability.

ehigham · 2023-09-18T17:12:51Z

hail/python/hail/ggplot/geoms.py

+        xi, yi = point_on_bound(i, upper)
+        return (yi - fy) / (xi - fx)
+
+    def update_min_max_slopes():


I think it would be clearer to return a tuple and unpack that at the call site rather than use side effects:

min_slope, max_slope = min_max_slopes()

ehigham · 2023-09-18T17:13:36Z

hail/python/hail/ggplot/geoms.py

+        max_slope = slope_from_fixed(ui, upper=True)
+
+    def fix_point_on_result(i, upper):
+        nonlocal fx, fy, new_y, keep


same here. I guess I'm not a fan of this pattern.

Normally I'm not either, but here it felt like the best way to abstract out repeated steps of the algorithm. Maybe it would be clearer if this were a class, and the mutable state fields on the class?

I found this more readable, chunking up common updates to the state of the algorithm rather than repeating more low level changes, but readability is subjective and I'm happy to inline these if you think that's clearer.

As discussed, there's not need to change this one. You raised a good point that if this were a class and these variables were attributes of the class, then perhaps I wouldn't object - which is certainly true. I don't think it's worth re-writing this as a class. I think this is fine as you're indexing into and mutating state variables, just please define the variables before you use them in this function so it's clear what you're referencing.

hail/python/hail/ggplot/geoms.py

done

[query] add less biased unsmoothed pdf to ggplot

16665bd

… and make the default

patrick-schultz assigned ehigham Sep 12, 2023

delete unused numpy import

772c6a5

ehigham previously requested changes Sep 14, 2023

View reviewed changes

hail/python/hail/ggplot/geoms.py Show resolved Hide resolved

hail/python/hail/ggplot/geoms.py Show resolved Hide resolved

patrick-schultz added 6 commits September 14, 2023 11:06

add description to _max_entropy_cdf

d66b9fe

wip

b139bb0

simplifying

0c70509

more cleanup

c115c8d

use moved error func in old plotting

85a91c2

refactor max_entropy_cdf a bit more

756a72f

patrick-schultz requested a review from ehigham September 18, 2023 15:52

fix

5a21501

ehigham previously requested changes Sep 18, 2023

View reviewed changes

patrick-schultz added 2 commits September 18, 2023 13:46

fix

40ca5cd

address comments

10f92c8

ehigham approved these changes Sep 20, 2023

View reviewed changes

fix

6978d99

danking merged commit dae37d7 into hail-is:main Sep 21, 2023

patrick-schultz deleted the unsmoothed-density-plot branch January 2, 2025 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] add less biased unsmoothed pdf to ggplot #13608

[query] add less biased unsmoothed pdf to ggplot #13608

patrick-schultz commented Sep 12, 2023

ehigham left a comment

ehigham Sep 18, 2023

patrick-schultz Sep 18, 2023

ehigham Sep 18, 2023 •

edited

Loading

ehigham Sep 18, 2023

patrick-schultz Sep 18, 2023

ehigham Sep 18, 2023

[query] add less biased unsmoothed pdf to ggplot #13608

[query] add less biased unsmoothed pdf to ggplot #13608

Conversation

patrick-schultz commented Sep 12, 2023

ehigham left a comment

Choose a reason for hiding this comment

ehigham Sep 18, 2023

Choose a reason for hiding this comment

patrick-schultz Sep 18, 2023

Choose a reason for hiding this comment

ehigham Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

ehigham Sep 18, 2023

Choose a reason for hiding this comment

patrick-schultz Sep 18, 2023

Choose a reason for hiding this comment

ehigham Sep 18, 2023

Choose a reason for hiding this comment

ehigham Sep 18, 2023 •

edited

Loading