Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly apply y limits on the entire plot in a stacked bar plot? #978

Closed
OSuwaidi opened this issue Jan 4, 2024 · 12 comments
Closed
Labels
bug Something isn't working

Comments

@OSuwaidi
Copy link

OSuwaidi commented Jan 4, 2024

I have this data which represents the sales decomposition from high-level to granular, left to right (hence sum over the index column should add up to "total_revenue"):

data = np.array([['total_revenue', 137576.4, 0],
       ['non-offer_revenue', 136261.41, 1],
       ['offer_revenue', 1314.99, 1],
       ['non-offer_revenue', 136261.41, 2],
       ['baseline_revenue', 24.81, 2],
       ['incremental_sales', 1290.18, 2]])
df = pl.DataFrame(data, schema={'sales_type': str, 'sales_value': float, 'index': int})

However, due to the nature of the data, there is drastic difference in the values of sales_value. Hence, when visualizing it via a stacked bar plot as follows:

(ggplot(df) 
 + geom_bar(aes(x='index', fill='sales_type', y='sales_value'), stat='identity',
            tooltips=layer_tooltips(['sales_value']).title('^fill'),
            labels=layer_labels(['sales_value'])
            )
+ theme(title=element_text(face='bold'), axis_title='blank', axis_ticks='blank', axis_text_x='blank')
+ labs(title='Offer Sales Decomposition', fill='Sales Type:')
)

Some of the stacked bars on the top are way too small compared to the bars below:
test

I tried to "refactor" this by lower bounding the y limit to "zoom in" and highlight the small, stacked bars on top by doing:
+ ylim(120000, 138000), but that applied the y limit constraint on each bar individually in the stacked bar plot, making the bars with small y values disappear instead!

I tried adding + scale_y_continuous(breaks={f'{i}': i for i in range(120000, 138001, 1000)}) but that didn't shift the y scale up either.

I can't apply a log scale on the y values either because then the stacked values won't add up to the same total amount.

Any workaround for this?

@alshan alshan added the bug Something isn't working label Jan 4, 2024
@alshan alshan added this to the 2024Q1 milestone Jan 4, 2024
@alshan
Copy link
Collaborator

alshan commented Jan 4, 2024

Hi, yes, it seems lims aren't working for y-axis in barchart. See related issue: JetBrains/lets-plot-kotlin#219

However, stacked bars poses a kind of discrepancy between user's expectations and how scale's limits work.
Scale limits by definition do indeed cause dropping of data-points that laying beyond the limits (I.e. all revenues below 120,000 in your example), which is rather unexpected.

As a workaround try applying coordinate system limits: + coord_cartesian(ylim=(120000, 138000))

@OSuwaidi
Copy link
Author

OSuwaidi commented Jan 5, 2024

Hey @alshan !

Thanks for your prompt response, the workaround worked exactly as expected!

Though, what's the difference between adding + ylim(ymin, ymax) vs + coord_cartesian(ylim(ymin, ymax)), how are they different under the hood? Is it that coord_cartesian() changes the viewport to the specified y-axis limits, while ylim() sort of acts like a threshold limit on the individual layers of a plot?

@alshan
Copy link
Collaborator

alshan commented Jan 7, 2024

That's right, "coord limits" work as a visual zooming.

With "scale limits", the key difference is that setting these limits discards all data outside the range.

Here is the relevant chapter: Zooming into a plot with coord_cartesian().

@OSuwaidi
Copy link
Author

OSuwaidi commented Jan 8, 2024

@alshan Hey!

One more related issue: When attaching numerical labels to corresponding stacked bars via labels=layer_labels(['sales_value']), it works fine, but upon altering the viewport limits via coord_cartesian(ylim=(min, max)), the labels on the lowest bars are not visible anymore because they sink to the very bottom, while the highest bars' labels are positioned at the very top, which affects the readability of the plot.

I tried to manually add the labels via geom_text() but it requires the addition of a custom "height" column in the dataframe (in this stacked bar plot's case) and some tinkering around. Plus, I believe its vjust property is not working as expected; because in the docs it says: "vjust : vertical text alignment. Possible values: ... or number between 0 (‘bottom’) and 1 (‘top’)." However, the only numerical values that actually take effect are either 0 or 1, number in between are simply ignored.

While the labels property is very useful and convenient, we have no control over the label's placement/positioning.

Adding what's equivalent to the vertical-align property in CSS would be very helpful, and maybe the nudge_y property from geom_text() as well!

@alshan
Copy link
Collaborator

alshan commented Jan 8, 2024

Hi, regarding annotation labels on bars - congrats, you nailed a bug :) : #981

As for geom_text(), the key thing you have to do is to use the "group" aesthetic to achieve labels stacking. On a stacked bar-chart that is.
Consider this example: https://nbviewer.org/github/JetBrains/lets-plot-docs/blob/master/source/examples/cookbook/position_stack.ipynb
Note geom_label(aes(..., group="year"), ... in Out [14], [15].

However, the only numerical values that actually take effect are either 0 or 1, number in between are simply ignored.

Could you provide a minimal example? In the demo above value 0.5 works as expected (see Out [15])

While the labels property is very useful and convenient, we have no control over the label's placement/positioning.

Adding what's equivalent to the vertical-align property in CSS would be very helpful, and maybe the nudge_y property from geom_text() as well!

Yes, we are planning to expand annotations API in this direction. Hopefully sooner than later.

@OSuwaidi
Copy link
Author

OSuwaidi commented Jan 9, 2024

Hey @alshan !

Regarding the use of the group aesthetic, to be honest, I still don't quite understand how it works; I understand it's supposed to logically separate and group the layers on a plot based on the group value, but I don't understand why you wouldn't just use color or fill instead.

Anyways, when adding it in the example mentioned in the first post:

(ggplot(df) 
 + geom_bar(aes(x='index', fill='sales_type', y='sales_value'), stat='identity',
            tooltips=layer_tooltips(['sales_value']).title('^fill'),
            )
+ theme(title=element_text(face='bold'), axis_title='blank', axis_ticks='blank', axis_text_x='blank')
+ labs(title='Offer Sales Decomposition', fill='Sales Type:')
+ geom_text(
            aes(label='sales_value', group='sales_type'),
            label_format='{0.2f}k',
            position='stack',
            )
)

It doesn't quite do anything (even with adding vjust for position).

The way I'm doing it right now is by addting a "height" column to my dataframe (using the df from above):
df = df.with_columns(pl.cum_sum('sales_value').over('index').alias('height'))

I then use that height in geom_text aesthetic to "properly" position the text labels as such:

+ coord_cartesian(ylim=(136100, 137600))
+ geom_text(
            aes(x='index', label='sales_value', y='height'),
            label_format='{0.2f}k',
            )
)

And regarding the use of vjust, I was referring to the vjust property within the geom_text() function itself:

+ geom_text(
            aes(x='index', label='sales_value', y='height'),
            label_format='{0.2f}k',
            vjust=1,
            )

that way, it doesn't work as expected:

  1. the value of 0 (corresponding to 'bottom') is not behaving as expected compared to a value of 1 ('top')
  2. it only understands either 0 or 1; values in between are simply the default. However, using position=position_stack(vjust=.5) does work

Thanks a lot!

@alshan
Copy link
Collaborator

alshan commented Jan 10, 2024

but I don't understand why you wouldn't just use color or fill instead.

You are right, as long as you map color aesthetic in geom_text() layer (color or fill in geom_label()) on a discrete variable, it will also create groups.
However, if you don't want to map text color to a variable than you will have to use the "group" aesthetic.

In your code snippet:

+ geom_text(
            aes(label='sales_value', group='sales_type'),
            label_format='{0.2f}k',
            position='stack',
            )

You forgot to add label coordinates: aes(..., x='index', y='sales_value')
Alternatively, you can move this mapping from "bar" to the root:

ggplot(df, mapping=aes(x='index', y='sales_value'))

so that both layers could share it.

And regarding the use of vjust, I was referring to the vjust property within the geom_text() function itself:

As far as I can see you don't have position="stack" here. Thus vjust has no effect.

The way I'm doing it right now is by addting a "height" column to my dataframe

Clever trick ) Hopefully you wan't need it.

@OSuwaidi
Copy link
Author

OSuwaidi commented Feb 11, 2024

Hey @alshan ,

Thanks for your help and comments as usual!

Sorry, I know this discussion is quite old, but I would like to point out, that in this case, where you have drastic differences between stacked bar values (heights), the current best solution (given that labels=layer_labels(['sales_value']) doesn't work with coord_cartesian(), yet) to display each bar's values is to use a mix of coord_cartesian(), aes(group=""), and position="stack". Here is how the complete code looks like (with df defined the same as in the first comment):

(ggplot(df)
 + geom_bar(
            aes(x='index', fill='sales_type', y='sales_value'), stat='identity',
            tooltips=layer_tooltips(['sales_value']).title('^fill')
            )
 + geom_text(
            aes(label='sales_value', group='sales_type', y='sales_value', x='index'),
            label_format='{0.2f}k',
            position="stack",
            vjust='top',
            size=6
            )
 + coord_cartesian(ylim=(136100, 137650))
 + theme(title=element_text(face='bold'), axis_title='blank', axis_ticks='blank', axis_text_x='blank')
 + labs(title='Offer Sales Decomposition', fill='Sales Type:')
 )

Which produces the following plot:

However, this is still suboptimal.

Would it perhaps be possible if the labels=layer_labels() class on bar plots worked the same to how it works in the case of geom_pie() (where the label annotations for small slices is displayed outside of the geom itself?) , such as in here?

(P.S. when adding nudge_y to the code snippet above (no matter the value), some label placements disappear)

@alshan
Copy link
Collaborator

alshan commented Feb 12, 2024

Hi @OSuwaidi
this overlapping
image
you can try to fix using vjust in position_stack:

geom_text(position=position_stack(vjust=0.9))

labels=layer_labels(['sales_value']) doesn't work with coord_cartesian(), yet

This is fixed already - will work in the next release.

Would it perhaps be possible if the labels=layer_labels() class on bar plots worked the same to how it works in the case of geom_pie() (where the label annotations for small slices is displayed outside of the geom itself?) , such as in here?

Maybe. This situation we couldn't figure out how to handle on bar-charts.

(P.S. when adding nudge_y to the code snippet above (no matter the value), some label placements disappear)

Thanks, this likely a bug.

alshan added a commit that referenced this issue Feb 21, 2024
@alshan
Copy link
Collaborator

alshan commented Mar 8, 2024

Hi @OSuwaidi , the issue with "scale limits" was fixed in v4.3.0. Note however that "coord system limits" is still a better way to zoom charts.

Also, since v4.3.0, you don't have to set the upper limit here : + coord_cartesian(ylim=(136100, 137650))
Try just: + coord_cartesian(ylim=(136100, None)).

@alshan alshan closed this as completed Mar 8, 2024
@OSuwaidi
Copy link
Author

OSuwaidi commented Mar 9, 2024

Hey @alshan 👋🏼!

Big, great changes in v4.3.0!

Yet to try out and test all of them. The coord_cartesian(ylim=()) works beautifully with layer_labels() now. And while I like the feature idea of coord_cartesian(ylim=(136100, None)) to not explicitly define an upper limit (especially important in dynamic settings), in my experience, it tends to add too much white space above the bar plot, it's quite unpredictable.

Great changes overall!

@alshan
Copy link
Collaborator

alshan commented Mar 11, 2024

in my experience, it tends to add too much white space above the bar plot, it's quite unpredictable.

The space above is the effect of the scale "expand". By default, continuous Y-scale has a multiplicative expand 0.05 (i.e. scale domain * 0.05).
You can try to remove multiplicative expand and set additive expand.
For example, scale_y_continuous(expand=[0, 1000]) will create a $1000 worth space above bars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants