-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Axes #373
Comments
I understand. btw where did you find this horrendous plot? What was your google keyword? |
That is quite a masterpiece. Brightened my morning. Thanks for sharing! |
What would be the right way to implement this in Gadfly? I'm probably not the most capable person to do this, but since I'd like to be able to use the feature, I guess I'm up ;) I have a very limited understanding (read: only slightly more than your cat) of how the inner workings of Gadfly are structured, so I'd probably require quite a lot of guidance to help with this - thus, free to ignore my offering and wait for someone more capable... (And yes, that really was a hideous plot. I promise not to produce anything remotely similar if/when this feature lands!) |
Probably the right way to do it is to allow per-layer scales and coordinates. Currently scales are applied only at the plot level, so while layers can have their own statistics and geometry, they must share one scale. Conceptually it's simple, but it may involve rearranging a lot of stuff. My other concern is per-layer scales can easily lead to garbage plots (like the one I posted). It's not Gadfly's job to prevent bad plots per se, but I'd like to find a way to make sure this isn't done accidentally. E.g. I could imagine someone innocently doing this plot(layer(..., Scale.x_continuous), layer(..., Scale.x_continous)) and being surprised by the result. |
I was thinking about this more lately. I've seen some plots with multiple axes that I actually quite like. Consider this one from "Capital in the Twenty-First Century" (forgive the shitty cell-phone photo). This works because the two axes are just different ways of measuring the same thing, and there is a linear relationship between the two. It doesn't change the interpretation of the plot geometry. It also improves the plot, making it intuitive to both Amercians and Europeans. In contrast, the multiple axes plots I hate are the ones in which the plot geometry no longer has a single interpretation. Here's something fairly typical. I think of plotting as a converting data into a language native to human brains: visual patterns. These plots are bad, or at least sub-optimal, because they introduce a lot of visual patterns that are meaningless. The "defective" line crosses the "cost" and "output" lines several times. That looks interesting, but has absolutely no meaning, since they measure different things and their relative positions on the y-axis are arbitrary. Yet the fact that France's minimum wage surpassed the US's in '84 looks interesting and is interesting. There's this great bit in Howard Wainer's introduction to the 2010 edition of "Semiology of Graphics" (which is sort of a precursor/inspiration to "The Grammar of Graphics").
The right think to do here is neither to ban multiple-axes plots nor to begrudgingly implement them, but to articulate the dichotomy between meaningful multiple-axes plots, and garbage ones, then make the former easy to draw and the latter possible only with amble neck wringing. So, here's a proposal: Plots will always have a native unit. Additional axes can be added, but only by specifying a conversion from the original units. So the first example in Gadfly would be drawn like: eur_to_usd = 1.32
plot(x=year, y=minimum_wage_in_euros, color=country,
Geom.point, Geom.line, Guide.y_axis(euros -> eur_to_usd * euros)) This is pretty easy to implement. It will also still allow (what I think are) garbage multiple-axes plots, but forces the user to endure the psychological trauma of passing a meaningless conversion function to TLDR: On second thought, I should add multiple axes plots. |
Your API example lacks one important aspect of these plots: the raw data for the minimum wage plot, would likely be available in € for the French wages, and in $ for the American. It would therefore be a lot easier to do this if one specified which data set goes on which axes, too, as well as a conversion factor between them. Maybe something like
Now it's well-defined which axis (left or right, y1 or y2) corresponds to which data, which also makes it clearer how the function mapping one to the other should be specified. (If we switch out the US data for England, would the order be preserved? If so, why? If not, the anonymous function euros->usd is just black magic...) |
I generally agree that multiple scales (axes) are visually problematic. And I would be against adding scale to layers, but I would propose different approach. In some cases it still may be useful to present two "categories" in one axis, but just two, because a plot has two vertical and two horizontal edges. For example - when data comes from stress testing of a web application. There is one dataset corresponding to number of virtual users visiting the application and other one corresponding to response time of the application. It usually happens that shape (or envelope) of response times follows number of virtual users. It is useful to display number of VU together with reply times. Obviously, VU number and time have different scale and units. For that case I would recommend to introduce notion of secondary (minor) axes (both x and y) that have its Scale (Scale.x2_continuous, ...) and aesthetics - e.g.
Labels and guides would be configured separately for x and x2 axis. I believe that this approach is more Gadfly like. |
Double axis (and even more) are often used with time series. iq2luc give a pretty good example. At the moment, I use vstack, but it is not easy to compare precisely 2 or more plots. I hope this feature will be available soon... |
Hi, just ran into this too and would also like to be able to do dual axis plots. Perhaps the comment at least helps build support to add this...
For the case we looking now it is sensible to have "non-scalable" values plotted on the same x-axis but left and right y-axes -- i.e. some population time series together with environmental temperature. |
I would like to give another example of a case against the limited notion multiple axis are only "good" if there is a linear relationship between the two (or more) y-measurements. I would like to create a plot that has both the CPU clock and temperature. The relationship between them is not guaranteed to be linear, even when they are not blocked by its maximum values, and surely the most interesting thing to observe is that, if temperature reaches a plateau, the clock keeps stable or it starts to degrade because of thermal throttling. |
Wow, I'm surprised that this isn't implemented. I'm even more surprised by the hostile attitude many people seem to have towards this. I can make un ugly misleading plot with one axis just as easily as I can make a nice one with 2 axes. That's an hour of my life I'll never get back. Later Gadfly. |
I'm going to create an issue for this and leave it open, since this comes up periodically, and I want to have conspicuous response.
I don't like plots with multiple axes. They're almost always terrible. Seriously, just do a google image search for “multiple axes”. You'll see some of the worst, most incomprehensible plots ever drawn. Stuff like this:
That said, there are a handful of arguably legitimate uses, for example labeling a temperature scale in fahrenheit and celsius. So I'm not opposed to adding this feature, but there about a thousand more important things I need to do to make Gadfly great, and for me, this is very near the bottom of the list.
If someone wants to implement this, I can describe how to do it and review their PR, but I'm probably not going to do it myself within the next few years.
The text was updated successfully, but these errors were encountered: