Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong range in hover info for basic histogram #5848

Open
bklingen opened this issue Jul 21, 2021 · 5 comments
Open

wrong range in hover info for basic histogram #5848

bklingen opened this issue Jul 21, 2021 · 5 comments
Labels
bug something broken P3 not needed for current cycle

Comments

@bklingen
Copy link

The hover information on the bin range shown for this basic histogram is rather misleading:

image

I would have expected a range of "50 - 100" shown for the second bin. (The same mislabeling occurs for all other bins.)

Codepen:

https://codepen.io/bklingenberg/pen/zYwdaOq?editors=0011

(This may be relate to discussions in #2113.)

@nicolaskruchten nicolaskruchten added the bug something broken label Jul 21, 2021
@alexcjohnson
Copy link
Collaborator

This is all in service of greater clarity at the bin edges. To be precise, what's happening here is two things:

  • We detected that the data values are all integers, so we shifted the bin edges down 0.5 to ensure that NO values are exactly at a bin edge. You can see this if you zoom in, the bins actually go -0.5 -> 49.5, 49.5 -> 99.5, 99.5 -> 149.5 etc
  • But listing exactly those values in the hover label would be confusing: what are half-integer values doing in a label for integer data? So we look at the data again and ask: what's the closest any value gets to the left or right edge of a bin? In this case it's 0.5 from the left edge and 9.5 from the right, and based on the bin width of 50 we can always represent these values with a zero at the end, so that's what we do - 0-40, 50-90, 100-140 etc. If you add a value that's just a little closer to the right edge of a bin - say change one of the 140s to 141 - you'll see the labels change to 0-49, 50-99, 100-149 etc, since we can no longer round to a bigger digit.

What we really DON'T want to do is have labels 50-100 and 100-150, because then it's ambiguous in which bin we put a value of exactly 100. But you could perhaps argue that the bin shift should match the range shrinkage - ie because we shifted the bins exactly 0.5 here we should also shrink the ranges we report by exactly 0.5 on each side, to 50-99, or if we want to keep 50-90 we should shift the bins by 5.

@nicolaskruchten
Copy link
Contributor

I could also see us listing the bounds of the data in each group, or using [50-100) style notation or even >=50 & <100

@bklingen
Copy link
Author

bklingen commented Jul 22, 2021

Thanks for commenting! I actually think labels 50-100 and 100-150 are not too ambiguous, and to me seem better than the current 50-90, 100-140, 150-190, etc. which indicate gaps. I interpret the hover information as giving me the range of the bin (and the count of observations falling in that bin). This is especially valuable when the x-axis tickmarks are not set to coincide with the boundaries of the bins, as is often the case. Then, I really would like to know the lower and upper limit of the bin I'm hovering over.

I don't think giving the range of the observations (e.g., 50-90) falling in the bin of 50-100 is as useful. It is the lower and upper bound of the bin that is the interesting information.

To me, the optimal solution is [50,100) for the default half-open intervals that plotly forms. (Or 50 - 99, where the precision can be set with hoverformat. I.e., hoverformat = ".2f" would yield 50.00-99.99.)

@nicolaskruchten
Copy link
Contributor

Right, so having thought about this more, as Alex says, the bin bounds aren't actually 50-100 here, they're 49.5-99.5 (due to the smart behaviour around integers). We can maybe debate later if this is a good idea, but at the very least we should frame our discussion around the actual bounds :) I think then that for this specific chart the hover should be [49.5, 99.5]

I should note that forcing the bins to 50-100 (verified by zoom!) with xbins: {start: 0, size: 50} still gives a hover label of 50-90 or 50-99 depending on the largest data value in that range and whether or not they're all integers, which seems like excessive coupling, and is also misleading.

But listing exactly those values in the hover label would be confusing: what are half-integer values doing in a label for integer data?

I don't think that's confusing, personally, I think it's clarifying :)

@bklingen
Copy link
Author

bklingen commented Nov 1, 2021

Hi,

I'm just trying to bring this issue/bug up again, as having correct hover information on the length of the bins would be a great enhancement for histograms.

@gvwilson gvwilson self-assigned this Jun 26, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added the P3 not needed for current cycle label Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P3 not needed for current cycle
Projects
None yet
Development

No branches or pull requests

4 participants