c_across() does not work with non-unique labels? #599

sjkiss · 2021-05-12T13:49:27Z

I am having some difficulties calculating rowMeans using c_across with labelled variables. The following code recreates two data frames I am using. The problem occurs with df1 and not with df, although the code is identical. But I think there is something about the way that value labels have been assigned to the variables in df1 that is causing the problem. Please note that using rowMeans() does work just fine; the problem appears to be the interaction of labelled variables in a particular way with c_across(). I really like the way that c_across() does not necessitate reattaching variables to the data.frame. In the olden times, I would have used mutate_at() to get at this.

Note: I often use car::Recode() rather than dplyr::Recode() because I find the syntax a little simpler, and because of path dependency; I have a ton of historic code that relies on it; I think dplyr::recode().

The code below returns the following error:

Error: Problem with mutate() input market_liberalism.
x labels must be unique.
 Input market_liberalism is mean(c_across(market1:market2)).
 The error occurred in row 1.

#Install car package if necessary

#install.packages('car')
library(tidyverse)
library(car)
library(labelled)

#this recreates df1
structure(list(PESE15 = structure(c(3, 5, 5, 8, NA), label = "The Government Should Leave it Entirely to the Private Sector to Create Jobs", na_values = c(8, 9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly agree` = 1, `Somewhat agree` = 3, Somewhatdisagree = 5, Stronglydisagree = 7,D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled",  "vctrs_vctr", "double")), MBSA2 = structure(c(3, 8, 1, 1, NA), label = "People Who Do Not Get Ahead Should Blame Themselves Not the System", na_values = 8, format.spss = "F1.0", display_width = 0L, labels = c(`Strongly agree` = 1,  Agree = 2, Disagree = 3, Stronglydisagree = 4, `No opinion` = 8), class = c("haven_labelled_spss", "haven_labelled", "vctrs_vctr",  "double"))), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), label = "NSDstat generated file")->df1

#use the car::Recode command to convert values to 0 to 1
df1$market1<-car::Recode(df1$PESE15, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA")
df1$market2<-car::Recode(df1$MBSA2, "1=1; 2=0.75; 3=0.25; 4=0; 8=0.5; else=NA")

#Use dplyr::c_across() to try to calculate the average 
df1 %>% 
  rowwise() %>% 
  mutate(market_liberalism=mean(
    c_across(market1:market2)))

#Using RowMeans does work. 
df1 %>% 
  select(market1:market2) %>% 
  mutate(market_liberalism=rowMeans(., na.rm=T)) 
#that works, but then it is somewhat difficult to get it back into the original data.frame

#setting value labels to NULL makes it work.
val_labels(df1$market1)<-NULL
val_labels(df1$market2)<-NULL
#Try again
df1 %>% 
  rowwise() %>% 
  mutate(market_liberalism=mean(
    c_across(market1:market2))) 

#This makes df2, similar dataset 
structure(list(cpsf6 = structure(c(3, 7, 7, 1, 7, 7), label = "The Government Should Leave it Entirely to the  Private Sector to Create Jobs", na_values = c(8, 
9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly Agree` = 1, 
`Somewhat Agree` = 3, SomewhatDisagree = 5, StronglyDisagree = 7, 
 D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled", 
 "vctrs_vctr", "double")), pese19 = structure(c(3, 7, 3, 1, NA, 5), label = "People Who Do Not Get Ahead Should Blame Themselves, Not the System", na_values = c(8, 9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly Agree` = 1, `Somewhat Agree` = 3, SomewhatDisagree = 5, StronglyDisagree = 7, 
D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled", 
"vctrs_vctr", "double"))), row.names = c(NA, -6L), class = c("tbl_df", "data.frame"))->df2

#use car::Recode() 
df2$market1<-car::Recode(df2$cpsf6, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA", as.numeric=T)
df2$market2<-car::Recode(df2$pese19, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA", as.numeric=T)
#
df2 %>% 
  rowwise() %>% 
  mutate(market_liberalism=mean(
    c_across(market1:market2)
    , na.rm=T ))

Results from sessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] labelled_2.8.0  cesdata_0.1.0   car_3.0-10      carData_3.0-4   forcats_0.5.1   stringr_1.4.0  
 [7] dplyr_1.0.5     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3     tibble_3.1.1    ggplot2_3.3.3  
[13] tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lubridate_1.7.10  lattice_0.20-41   assertthat_0.2.1  psych_2.1.3       utf8_1.2.1       
 [7] R6_2.5.0          cellranger_1.1.0  backports_1.2.1   reprex_1.0.0      httr_1.4.2        pillar_1.6.0     
[13] rlang_0.4.11      curl_4.3          readxl_1.3.1      rstudioapi_0.13   data.table_1.14.0 foreign_0.8-81   
[19] munsell_0.5.0     broom_0.7.5       compiler_4.0.4    modelr_0.1.8      pkgconfig_2.0.3   mnormt_2.0.2     
[25] tmvnsim_1.0-2     tidyselect_1.1.1  rio_0.5.26        fansi_0.4.2       withr_2.4.1       crayon_1.4.1     
[31] dbplyr_2.1.0      grid_4.0.4        nlme_3.1-152      jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0  
[37] DBI_1.1.1         magrittr_2.0.1    scales_1.1.1      zip_2.1.1         cli_2.5.0         stringi_1.5.3    
[43] fs_1.5.0          xml2_1.3.2        ellipsis_0.3.2    generics_0.1.0    vctrs_0.3.8       openxlsx_4.2.3   
[49] tools_4.0.4       glue_1.4.2        hms_1.0.0         abind_1.4-5       parallel_4.0.4    colorspace_2.0-0 
[55] rvest_1.0.0       haven_2.4.1.9000

The text was updated successfully, but these errors were encountered:

gorcha · 2021-07-25T08:09:23Z

Hi @sjkiss, there's a bug when combining two vectors with different labels for the same values.
This was fixed for labelled vectors but missed for labelled_spss.

@hadley this is the same bug fixed in ed2ddda, I'll chuck up a pull request.

gorcha added a commit to gorcha/haven that referenced this issue Jul 25, 2021

Fix logic error in label deduping for labelled_spss() (tidyverse#599)

fbb1bdb

gorcha mentioned this issue Jul 25, 2021

Fix logic error in label deduping for labelled_spss() #618

Merged

hadley pushed a commit that referenced this issue Jul 25, 2021

Fix logic error in label deduping for labelled_spss() (#599)

573d20e

hadley closed this as completed Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c_across() does not work with non-unique labels? #599

c_across() does not work with non-unique labels? #599

sjkiss commented May 12, 2021

gorcha commented Jul 25, 2021

c_across() does not work with non-unique labels? #599

c_across() does not work with non-unique labels? #599

Comments

sjkiss commented May 12, 2021

gorcha commented Jul 25, 2021