Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram events & bin hover label improvements #2113

Merged
merged 18 commits into from
Oct 24, 2017
Merged

Conversation

alexcjohnson
Copy link
Collaborator

Implements #2071 (histogram events get the input points selected) and #2086 (better display of data ranges for histogram bins) for both histogram and the 2d histogram types (histogram2d and histogram2dcontour)

// don't trust floating point equality - fraction of bin size to call
// "on the line" and ensure that they go the right way specified by
// linelow
var roundingError = 1e-9;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised that this hadn't come up before (other than the test below that was actually wrong before), and obviously it's a bit questionable exactly what small fraction to put here... but if we just use straight < vs <= etc it's pretty easy to trick findBin on the edges.

if(d.xLabelVal !== undefined) {
d.xLabel = ('xLabel' in d) ? d.xLabel : Axes.hoverLabelText(d.xa, d.xLabelVal);
d.xLabel = ('xLabel' in d) ? d.xLabel : getDimText(d, 'x');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etpinard I made getDimText in parallel with you adding d.(x|y)Label and Axes.hoverLabelText - I suppose in principle we could use d.(x|y)Label for this too instead of adding d.(x|y)LabelVal(0|1), maybe that would actually be simpler... let me take a look.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, better to use d.(x|x)Label -> 51ad8c2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better in 51ad8c2

Thanks!

cdi.p1 = roundFn(binEdges[i + 1], true);

// pts and p0/p1 don't seem to make much sense for cumulative distributions
if(!cumulativeSpec.enabled) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A questionable decision... this gets at some of the same issues that led to cumulative.currentbin but in principle a CDF shows the sum of all the data prior to a specific point, not the data within that bin. perhaps I should shift p0 and p1 depending on currentbin so that the hover label at least gets the description "all the data prior to X" precisely correct, even though it'll be a different value than the center of the bar...

And then what about pts? You could say it should be included and again should be "all the data prior to X" but then it would be meaningless to select a single bar and not all the bars before it. Alternatively pts could contain "all the data that was added in this bar" which might be more useful in terms of selecting a single bar, but it's not really what that bar means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that the hover label at least gets the description "all the data prior to X"

Yeah. I think something along those lines would be useful down the road.

You could say it should be included and again should be "all the data prior to X" but then it would be meaningless to select a single bar and not all the bars before it. Alternatively pts could contain "all the data that was added in this bar"

Sounds to me like both all-data-prior-to-X and data-added-in-this-bar point lists could be useful, so we'll might have to emit both in the future. Leaving this discussion for another time is 👌 with me. I doubt that most plotly.js users even know about cumulative histograms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I'll make a new issue to discuss how this should work for CDFs.


// too small of a right gap: stop disambiguating data at the edge
_test(0, 0.0009, [0, 1, 2], false, [0, 1, 1, 2]);
_test(0.1, 0.009, [115, 125, 135], false, [115, 125, 125, 135]);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etpinard @chriddyp as discussed offline this morning - compare to lines 699 and 701 above that are just on the other side of the cutoff I implemented in ad0e08f - seem reasonable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear, what these tests are indicating is that the hover labels for these bins will be 115 - 124.99 and 125 - 134.99 on the disambiguated side of the cutoff, and 115 - 125 and 125 - 135 on the too-many-digits-to-disambiguate side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

[cn1_00, Lib.dateTime2ms('2009-01-01', cn), cn1_10, Lib.dateTime2ms('2019-01-01', cn)]);
// occasionally we give extra precision for world dates (month when it should be year
// or day when it should be month). That's better than doing the opposite... not going
// to fix now, too many edge cases, better not to complicate the logic for them all.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm I came up with for figuring out how many digits to show is basically:

  • find the minimum distance from data points to bin edges from either side - this is just two numbers for the whole distribution
  • for two consecutive bins, find the largest digit that changes in both the left and right gaps (with a little shifting to account for rounding errors)

This algorithm is a little complicated to state, but it's quick to calculate and it's robust as far as I can see, on numeric, category, and gregorian date axes (though even in that case there is a weird edge case that confuses it). However, the larger variability of month and year lengths in world calendars causes it to sometimes think a digit doesn't change when in the actual data it does, and increasing the tolerance to catch this breaks some much more common cases.

I suspect we could fix this by looking at all bin edges independent of each other, but that's far more computationally intensive, and has some other potential weird pitfalls. I don't think providing a bit of extra precision in the label for a few world calendar cases justifies all that extra complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ algo. Thanks for stating it in details.

Can you comment on what modifications (if any) your algo will need when we implement variable-sized bins #360?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with dates is that the actual time increments - the "digits" we're looking to see if they have changed - have different sizes in terms of milliseconds. The bin sizes being different should not be a problem, though people could do things to thwart the algorithm, like having the first two bin edges on fairly round values but some subsequent edges at not-so-round values. The idea that two edges are enough to determine the biggest digit that always changes comes from the observation that if one bin is extra-round, its neighbor will not be. So to do this robustly with nonuniform bins, we could either:

  • Do this same algorithm on all bin edges. That's different from looking at all bin edges independently as I suggested above for world calendars, though that would solve this problem too - what I'm suggesting here is to calculate just one left-edge-gap and one right-edge-gap as we do now, but then test those gaps against all bin edges instead of the first two.
  • As a last resort, make a new attribute for folks to set their own rounding.

One common case mentioned in #360 is equally-sized bins on a log axis. There, the whole assumption of a single digit to round to is wrong, and we'll have to do something completely different or just give up on the idea of disambiguating the edges, and let the regular axis-formatting-for-hover system handle it unchanged.

@@ -1221,7 +1221,7 @@ axes.hoverLabelText = function(ax, val) {
var tx = axes.tickText(ax, ax.c2l(logOffScale ? -val : val), 'hover').text;

if(logOffScale) {
return val === 0 ? '0' : '-' + tx;
return val === 0 ? '0' : MINUS_SIGN + tx;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been around for a while, using regular dash instead of unicode minus. tested in 3aa03f7#diff-846cf5b534aa0d22bdd1da2b43ac3cbaR517

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that. Thanks 👌

_hover(gd, 250, 200);
assertHoverLabelContent({
nums: '3',
axis: '3.3'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ another potentially controversial decision: whenever every bin contains a single unique value, label exactly that value, not a range, even if those single values are not at the same places within the bins. But as soon as you have any bin with two distinct values in it, we use the range logic.

Note: for histogram2d I applied this independently to each axis, not to each brick. So if one x brick has only the point (2, 3) and one above it has only the point (2.1, 4), this will make the x bins show ranges, because that one x bin has multiple values, even though each brick has unique values in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice touch here. I was worried that category histograms would get e.g, apples - lemons bin hover labels. 👌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can happen - but only if the bins actually contain apples and lemons.

Plotly.newPlot(gd, [{
  x: ['apples','apples','apples','lemons','lemons','peaches','peaches','pears'],
  type: 'histogram',
  xbins: {start: -0.5, end: 3.5, size: 2}
}],{
  width: 400,
  height: 400
})

screen shot 2017-10-24 at 11 43 19 am

It's possible to put as many categories as you want into one bin, and as with numbers we'll only show the first and last ones in the label. Never happens automatically since it's in general confusing, but if you want to do it we won't stop you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if you want to do it we won't stop you!

Cool. Would you mind adding a test case for this situation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Would you mind adding a test case for this situation?

0331a16

@etpinard etpinard added this to the v1.32.0 milestone Oct 24, 2017
Copy link
Contributor

@etpinard etpinard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I made one blocking comment (test plotly_hover event data).

Once merged, we should open a new issue or leave #2086 open to discuss what to do with cumulative histograms and variable-sized bins #360

if(d.xLabelVal !== undefined) {
d.xLabel = ('xLabel' in d) ? d.xLabel : Axes.hoverLabelText(d.xa, d.xLabelVal);
d.xLabel = ('xLabel' in d) ? d.xLabel : getDimText(d, 'x');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better in 51ad8c2

Thanks!


var barHover = require('../bar/hover');

module.exports = function hoverPoints(pointData, xval, yval, hovermode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice.

@@ -1221,7 +1221,7 @@ axes.hoverLabelText = function(ax, val) {
var tx = axes.tickText(ax, ax.c2l(logOffScale ? -val : val), 'hover').text;

if(logOffScale) {
return val === 0 ? '0' : '-' + tx;
return val === 0 ? '0' : MINUS_SIGN + tx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that. Thanks 👌

.then(function() {
_hover(gd, 250, 200);
assertHoverLabelContent({
nums: '\u22125', // unicode minus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for 🔒 ing this down!

}
else {
cdi.p0 = roundFn(binEdges[i]);
cdi.p1 = roundFn(binEdges[i + 1], true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ ing that you were able to put all thing new rounding logic in calc.

// 15-day gap - still days
_test(0, 15 * day, [jan17, jan18, jan19], 'gregorian',
[jan17, jan18 - day, jan18, jan19 - day]);
// 28-day gap STILL gets days - in principle this might happen with data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for writing this down

[cn1_00, Lib.dateTime2ms('2009-01-01', cn), cn1_10, Lib.dateTime2ms('2019-01-01', cn)]);
// occasionally we give extra precision for world dates (month when it should be year
// or day when it should be month). That's better than doing the opposite... not going
// to fix now, too many edge cases, better not to complicate the logic for them all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ algo. Thanks for stating it in details.

Can you comment on what modifications (if any) your algo will need when we implement variable-sized bins #360?

_hover(gd, 250, 200);
assertHoverLabelContent({
nums: '3',
axis: '3.3'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice touch here. I was worried that category histograms would get e.g, apples - lemons bin hover labels. 👌

})
.then(function() {
_hover(gd, 250, 200);
assertHoverLabelContent({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would mind checking that plotly_hover does emit the correct event data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -> tested in 2696c1c

cdi.p1 = roundFn(binEdges[i + 1], true);

// pts and p0/p1 don't seem to make much sense for cumulative distributions
if(!cumulativeSpec.enabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that the hover label at least gets the description "all the data prior to X"

Yeah. I think something along those lines would be useful down the road.

You could say it should be included and again should be "all the data prior to X" but then it would be meaningless to select a single bar and not all the bars before it. Alternatively pts could contain "all the data that was added in this bar"

Sounds to me like both all-data-prior-to-X and data-added-in-this-bar point lists could be useful, so we'll might have to emit both in the future. Leaving this discussion for another time is 👌 with me. I doubt that most plotly.js users even know about cumulative histograms.

@etpinard
Copy link
Contributor

Nicely done 💃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants