Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c_across() does not work with non-unique labels? #599

Closed
sjkiss opened this issue May 12, 2021 · 1 comment
Closed

c_across() does not work with non-unique labels? #599

sjkiss opened this issue May 12, 2021 · 1 comment

Comments

@sjkiss
Copy link

sjkiss commented May 12, 2021

I am having some difficulties calculating rowMeans using c_across with labelled variables. The following code recreates two data frames I am using. The problem occurs with df1 and not with df, although the code is identical. But I think there is something about the way that value labels have been assigned to the variables in df1 that is causing the problem. Please note that using rowMeans() does work just fine; the problem appears to be the interaction of labelled variables in a particular way with c_across(). I really like the way that c_across() does not necessitate reattaching variables to the data.frame. In the olden times, I would have used mutate_at() to get at this.

Note: I often use car::Recode() rather than dplyr::Recode() because I find the syntax a little simpler, and because of path dependency; I have a ton of historic code that relies on it; I think dplyr::recode().

The code below returns the following error:

Error: Problem with mutate() input market_liberalism.
x labels must be unique.
 Input market_liberalism is mean(c_across(market1:market2)).
 The error occurred in row 1.
#Install car package if necessary

#install.packages('car')
library(tidyverse)
library(car)
library(labelled)

#this recreates df1
structure(list(PESE15 = structure(c(3, 5, 5, 8, NA), label = "The Government Should Leave it Entirely to the Private Sector to Create Jobs", na_values = c(8, 9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly agree` = 1, `Somewhat agree` = 3, Somewhatdisagree = 5, Stronglydisagree = 7,D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled",  "vctrs_vctr", "double")), MBSA2 = structure(c(3, 8, 1, 1, NA), label = "People Who Do Not Get Ahead Should Blame Themselves Not the System", na_values = 8, format.spss = "F1.0", display_width = 0L, labels = c(`Strongly agree` = 1,  Agree = 2, Disagree = 3, Stronglydisagree = 4, `No opinion` = 8), class = c("haven_labelled_spss", "haven_labelled", "vctrs_vctr",  "double"))), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), label = "NSDstat generated file")->df1

#use the car::Recode command to convert values to 0 to 1
df1$market1<-car::Recode(df1$PESE15, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA")
df1$market2<-car::Recode(df1$MBSA2, "1=1; 2=0.75; 3=0.25; 4=0; 8=0.5; else=NA")

#Use dplyr::c_across() to try to calculate the average 
df1 %>% 
  rowwise() %>% 
  mutate(market_liberalism=mean(
    c_across(market1:market2)))

#Using RowMeans does work. 
df1 %>% 
  select(market1:market2) %>% 
  mutate(market_liberalism=rowMeans(., na.rm=T)) 
#that works, but then it is somewhat difficult to get it back into the original data.frame

#setting value labels to NULL makes it work.
val_labels(df1$market1)<-NULL
val_labels(df1$market2)<-NULL
#Try again
df1 %>% 
  rowwise() %>% 
  mutate(market_liberalism=mean(
    c_across(market1:market2))) 

#This makes df2, similar dataset 
structure(list(cpsf6 = structure(c(3, 7, 7, 1, 7, 7), label = "The Government Should Leave it Entirely to the  Private Sector to Create Jobs", na_values = c(8, 
9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly Agree` = 1, 
`Somewhat Agree` = 3, SomewhatDisagree = 5, StronglyDisagree = 7, 
 D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled", 
 "vctrs_vctr", "double")), pese19 = structure(c(3, 7, 3, 1, NA, 5), label = "People Who Do Not Get Ahead Should Blame Themselves, Not the System", na_values = c(8, 9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly Agree` = 1, `Somewhat Agree` = 3, SomewhatDisagree = 5, StronglyDisagree = 7, 
D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled", 
"vctrs_vctr", "double"))), row.names = c(NA, -6L), class = c("tbl_df", "data.frame"))->df2

#use car::Recode() 
df2$market1<-car::Recode(df2$cpsf6, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA", as.numeric=T)
df2$market2<-car::Recode(df2$pese19, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA", as.numeric=T)
#
df2 %>% 
  rowwise() %>% 
  mutate(market_liberalism=mean(
    c_across(market1:market2)
    , na.rm=T )) 

Results from sessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] labelled_2.8.0  cesdata_0.1.0   car_3.0-10      carData_3.0-4   forcats_0.5.1   stringr_1.4.0  
 [7] dplyr_1.0.5     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3     tibble_3.1.1    ggplot2_3.3.3  
[13] tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lubridate_1.7.10  lattice_0.20-41   assertthat_0.2.1  psych_2.1.3       utf8_1.2.1       
 [7] R6_2.5.0          cellranger_1.1.0  backports_1.2.1   reprex_1.0.0      httr_1.4.2        pillar_1.6.0     
[13] rlang_0.4.11      curl_4.3          readxl_1.3.1      rstudioapi_0.13   data.table_1.14.0 foreign_0.8-81   
[19] munsell_0.5.0     broom_0.7.5       compiler_4.0.4    modelr_0.1.8      pkgconfig_2.0.3   mnormt_2.0.2     
[25] tmvnsim_1.0-2     tidyselect_1.1.1  rio_0.5.26        fansi_0.4.2       withr_2.4.1       crayon_1.4.1     
[31] dbplyr_2.1.0      grid_4.0.4        nlme_3.1-152      jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0  
[37] DBI_1.1.1         magrittr_2.0.1    scales_1.1.1      zip_2.1.1         cli_2.5.0         stringi_1.5.3    
[43] fs_1.5.0          xml2_1.3.2        ellipsis_0.3.2    generics_0.1.0    vctrs_0.3.8       openxlsx_4.2.3   
[49] tools_4.0.4       glue_1.4.2        hms_1.0.0         abind_1.4-5       parallel_4.0.4    colorspace_2.0-0 
[55] rvest_1.0.0       haven_2.4.1.9000 
@gorcha
Copy link
Member

gorcha commented Jul 25, 2021

Hi @sjkiss, there's a bug when combining two vectors with different labels for the same values.
This was fixed for labelled vectors but missed for labelled_spss.

@hadley this is the same bug fixed in ed2ddda, I'll chuck up a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants