Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bin range in column description #525

Closed
reza1615 opened this issue Jun 28, 2021 · 6 comments
Closed

Bin range in column description #525

reza1615 opened this issue Jun 28, 2021 · 6 comments

Comments

@reza1615
Copy link

Please add bin range as a second xaxis to the histogram.
For example: I need to know Ratio of survival of each age range.
image

@aschonfeld
Copy link
Collaborator

So this is really hard to add since the bins could be different for each target value. When I create the target histogram I break the data up into groups (one for each target) and then compute the histogram buckets for each group. So therefore the buckets could be different for each group making the display very hard to show since there could be a lot of targets (and thus a lot of bin ranges).

If you believe the target histograms are being calculated incorrectly then that's a different story. I could fix that logic and then display the ranges.

@reza1615
Copy link
Author

reza1615 commented Jul 5, 2021

You have already the bins just need to get its range now this chart is binned base on Age

@aschonfeld
Copy link
Collaborator

aschonfeld commented Jul 5, 2021

Once again, it's not that simple:

  • For a non-targeted histogram it's easy (and currently displayed) because only displaying one group of bins
  • For a targeted histogram it's much harder since we have different bins for each target group
target bucket bin count
A 0 [0, 2) 1
A 1 [2, 3) 2
B 0 [0, 1) 2
B 2 [1, 3) 1

So if you were to show bins for bucket (0) then you would need to show both [0,2) & [0,1) which doesn't sound tough enough but that's an example which only has 2 target groups (A, B).

Side Note: if I'm currently calculating the buckets incorrectly then that is a different story. That would me that there is one set of bins shared amongst all the target groups. So I could display that easily.

@reza1615
Copy link
Author

reza1615 commented Jul 5, 2021

For example for the titanic data.
First we binning the age to 20 buckets after that in each bucket check for survival or not. so the axisX lable should be the bin range
In my opinion, First you should bin after that check the target condition inside each bin
it will be

bucket target bin range items in the bin and target total items in the bin
0 A [0, 2) 1 3
0 B [0, 2) 2 3
1 B [2, 3) 1 6
1 A [2, 3) 5 6

My request to have the bin range as lable for axisX

@aschonfeld
Copy link
Collaborator

Ok, so this follows my last comment where we perform the binning before we group by target (in your example “survival”) which would give us one set of bins which is definitely doable.

The original code provided did not do this, which is why I was originally saying this was not doable.

aschonfeld added a commit that referenced this issue Jul 5, 2021
* Reworked targeted histograms to calculate bins before grouping by target
* #525: bin range on x-axis
* #526: targeted histogram tooltip
aschonfeld added a commit that referenced this issue Jul 10, 2021
* Reworked targeted histograms to calculate bins before grouping by target
* #525: bin range on x-axis
* #526: targeted histogram tooltip
@aschonfeld
Copy link
Collaborator

added in v1.52.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants