Skip to content

Nothing plotted with manual_scale when a named vector is used as the input. #4087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ds-jim opened this issue Jun 24, 2020 · 11 comments · Fixed by #4093
Closed

Nothing plotted with manual_scale when a named vector is used as the input. #4087

ds-jim opened this issue Jun 24, 2020 · 11 comments · Fixed by #4093

Comments

@ds-jim
Copy link

ds-jim commented Jun 24, 2020

When building a manual_scale ggplot shows no data when the number of colours <= the number in the scale when a named vector is used. Strangely the colours work when the number of colours > the number in the scale. This seems to be fixed by using an unnamed vector.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(scales)
library(ggplot2)
library(tibble)


named_vec <- c("teracota" = "#d18975", 
                 "green" = "#8fd175",
                 "purple" = "#3f2d54")

scale_colour_named <- function(...)
{ discrete_scale("colour", "test_pal",
                 manual_pal(named_vec)) }

scale_colour_unnamed <- function(...)
{ discrete_scale("colour", "test_pal",
                 manual_pal(unname(named_vec))) }


  #Works when n > than pallet
USArrests %>%
  rownames_to_column("State") %>%
  slice(1:4) %>%
  ggplot(aes(x = State, y = Murder, col = State)) +
  geom_point(size = 10, show.legend = FALSE) +
  scale_colour_named()
#> Warning: This manual palette can handle a maximum of 3 values. You have supplied
#> 4.
#> Warning: Removed 1 rows containing missing values (geom_point).

  #No points plotted
USArrests %>%
  rownames_to_column("State") %>%
  slice(1:3) %>%
  ggplot(aes(x = State, y = Murder, col = State)) +
  geom_point(size = 10, show.legend = FALSE) +
  scale_colour_named()
#> Warning: Removed 3 rows containing missing values (geom_point).

  #However, it works without using a named vector
USArrests %>%
  rownames_to_column("State") %>%
  slice(1:3) %>%
  ggplot(aes(x = State, y = Murder, col = State)) +
  geom_point(size = 10, show.legend = FALSE) +
  scale_colour_unnamed()

Created on 2020-06-24 by the reprex package (v0.3.0)

@karawoo
Copy link
Member

karawoo commented Jun 24, 2020

The relevant lines are here:

ggplot2/R/scale-.r

Lines 809 to 819 in 7f88a56

if (is_named(pal)) {
# if pal is named, limit the pal by the names first,
# then limit the values by the pal
idx_nomatch <- is.na(match(names(pal), limits))
pal[idx_nomatch] <- NA
pal_match <- pal[match(as.character(x), names(pal))]
pal_match <- unname(pal_match)
} else {
# if pal is not named, limit the values directly
pal_match <- pal[match(as.character(x), limits)]
}

With 3 colors, we enter the first branch of the if statement because all 3 colors have names. The names don't match the limits "Alabama" "Alaska" "Arizona", so they get replaced with NA.

However when we request 4 colors, is_named() returns FALSE because the fourth's name is NA, so we enter the else branch where we don't care about the names.

library("scales")
library("rlang")

named_vec <- c(
  "teracota" = "#d18975",
  "green" = "#8fd175",
  "purple" = "#3f2d54"
)

(pal_3 <- manual_pal(named_vec)(3))
#>  teracota     green    purple 
#> "#d18975" "#8fd175" "#3f2d54"
(pal_4 <- manual_pal(named_vec)(4))
#> Warning: This manual palette can handle a maximum of 3 values. You have supplied
#> 4.
#>  teracota     green    purple      <NA> 
#> "#d18975" "#8fd175" "#3f2d54"        NA
is_named(pal_3)
#> [1] TRUE
is_named(pal_4)
#> [1] FALSE

Created on 2020-06-24 by the reprex package (v0.3.0.9001)

It looks like some of this was added by @yutannihilation in #3685 to fix a different issue. I'm not totally sure what we want to do here, but I agree it is confusing, both that the palette has to be unnamed (though this can be worked around easily enough), and that when you have too many values for the palette it suddenly works. I ran into the first issue myself just the other day.

@clauswilke
Copy link
Member

The convention in ggplot2 is that for named palettes the values are assigned by matching the names to the scale limits. So example 2 and example 3 are correct, I'd say. The only question is whether example 1 is a problem. Since it issues a clear warning, maybe not?

@karawoo
Copy link
Member

karawoo commented Jun 24, 2020

I think example 1 is behaving inconsistently regardless of the warning message. Example 1 has two issues: the limits don't match the named values and it requests more values than are in the palette. The warning message clearly explains the latter issue, but that still leaves the fact that when 3 colors are requested and they don't match the limits, none of them appear, but when too many colors are requested and they don't match the limits, 3 of them appear.

@yutannihilation
Copy link
Member

I still don't get the correct semantics here, but this seems just my mistake that I didn't aware how is_named() handle the NA and empty names. Sorry...

is_named() returns FALSE because the fourth's name is NA,

@karawoo
Copy link
Member

karawoo commented Jun 24, 2020

No worries, it's definitely an edge case. It sounds like we should stick with the existing behavior when the limits don't match the names in a named palette, but update our check for the names so that in example 1 it would return TRUE. If you agree I can put together a PR.

@yutannihilation
Copy link
Member

Thanks, basically I agree.

By the way, one of the cause of this confusion is that the behavior of scales::manual_pal() is different than the palette function used in manual_scale(). The latter inline function returns the values as it is since all the values are necessary for named lookups. OTOH, scales::manual_pal() returns only the same length of values as n, which is not suitable for named usage. I'm not sure how to fix this, but maybe some improvement is needed also on scales' side...?

manual_pal <- function(values) {
  force(values)
  function(n) {
    n_values <- length(values)
    if (n > n_values) {
      warning("This manual palette can handle a maximum of ", n_values,
        " values. You have supplied ", n, ".",
        call. = FALSE
      )
    }
    values[seq_len(n)]
  }
}

ggplot2/R/scale-manual.r

Lines 143 to 148 in ccd94e1

pal <- function(n) {
if (n > length(values)) {
abort(glue("Insufficient values in manual scale. {n} needed but only {length(values)} provided."))
}
values
}

@yutannihilation
Copy link
Member

Quickly revisited the histories.

When manual_pal() was imported into scales (r-lib/scales@0554a8b#diff-5f0b16b59b2afc2686d4fa16356c4f36) 10 years ago, it seems only unnamed usage was supported. And the code hasn't changed except for showing a friendly warning.

Meanwhile, ggplot2 had supported named usage at some point (sorry, I couldn't find the exact commit), and then the implementation got diverged by this commit, which removed manual_pal() and implemented ggplot2's own code: 2e0b713. That's why manual_pal() is not consistent with the current usage of ggplot2 and yet doesn't break anything.

@ds-jim
Just curious, why do you use discrete_scale() directly instead of scale_fill_manual() or scale_discrete_manual()?

@yutannihilation
Copy link
Member

I mean, it seems manual_pal() is NOT a palette that discrete_scale() expects (and I still don't know where and how to solve this confusion...).

@ds-jim
Copy link
Author

ds-jim commented Jun 25, 2020

@yutannihilation I'm using discreet scale as I'm building a company colour theme for our internal package. I'm a heavy ggplot2 user but this is my first foray into building colour pallets. I used this post as my starting point https://vf.svbtle.com/creating-corporate-colour-palettes-for-ggplot2. However, for groups where n < our branding pallet using a colorRampPalette() gave poor results. To solve that I looked into the code used by hrbrthemes:: scale_colour_ipsum as I knew it behaved in a predictable way and found manual_pal() was the function I needed.

I completely agree that this is an edge case but not knowing how ggplot handled the vector took me quite a while to realise it was the named vector causing the issue, as you suggested it was very easily fixed in my code with unname().

I can see that named vectors have their place and my only suggestion is to either close the case and have this knowledge saved to help someone else who stumbles into this issue or return a more explicit warning like when you accidentally type a pipe instead of a + Error: mappingmust be created byaes() Did you use %>% instead of +?

@yutannihilation
Copy link
Member

Ah, thanks for the detail. I got it. You actually needed manual_pal() for a good reason. I feel manual_pal() needs some option like keep_names and remove names by default.

(I just remembered I had the same experience of creating a company colour theme and shouting "why do I need unname() here??" in my mind... haha)

@karawoo
Copy link
Member

karawoo commented Jun 25, 2020

(I just remembered I had the same experience of creating a company colour theme and shouting "why do I need unname() here??" in my mind... haha)

This literally just happened to me last week as well 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants