Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr::do error with surv_cutpoint #104

Closed
MarcinKosinski opened this issue Jan 14, 2017 · 2 comments
Closed

dplyr::do error with surv_cutpoint #104

MarcinKosinski opened this issue Jan 14, 2017 · 2 comments

Comments

@MarcinKosinski
Copy link
Contributor

dplyr::do error with surv_cutpoint

expected behaviour

surv_categorize works for grouped data.frames in dplyr::do worklfow.

actual behaviour

I get an error, probably from survival

# load neccessary packages
library(survminer)
library(survival)
library(RTCGA)
library(RTCGA.clinical)
library(RTCGA.mRNA)
# extract clinical information
clin <- survivalTCGA(BRCA.clinical, OV.clinical)
head(clin)
  times bcr_patient_barcode patient.vital_status
1  3767        TCGA-3C-AAAU                    0
2  3801        TCGA-3C-AALI                    0
3  1228        TCGA-3C-AALJ                    0
4  1217        TCGA-3C-AALK                    0
5   158        TCGA-4H-AAAK                    0
6  1477        TCGA-5L-AAT0                    0
# extract micro-array gene expression
expr <- expressionsTCGA(BRCA.mRNA, OV.mRNA, 
                        extract.cols = c("PAX8"))
head(expr)
# A tibble: 6 × 3
           bcr_patient_barcode   dataset     PAX8
                         <chr>     <chr>    <dbl>
1 TCGA-A1-A0SD-01A-11R-A115-07 BRCA.mRNA -0.54225
2 TCGA-A1-A0SE-01A-11R-A084-07 BRCA.mRNA -0.59475
3 TCGA-A1-A0SH-01A-11R-A084-07 BRCA.mRNA  0.49975
4 TCGA-A1-A0SJ-01A-11R-A084-07 BRCA.mRNA -0.58850
5 TCGA-A1-A0SK-01A-12R-A084-07 BRCA.mRNA -0.96475
6 TCGA-A1-A0SM-01A-11R-A084-07 BRCA.mRNA  0.57275
# join both datasets
library(tidyverse)
clin %>%
   left_join(expr %>%
       filter(substr(bcr_patient_barcode, 14, 15) == "01") %>% # tumour samples
       mutate(bcr_patient_barcode = substr(bcr_patient_barcode,1,12))) %>%
   filter(!is.na(PAX8)) %>% # no mRNA info
   mutate(dataset = gsub("\\..+", "", dataset)) -> # remove mRNA part from the name
   clin_expr
head(clin_expr)
  times bcr_patient_barcode patient.vital_status dataset     PAX8
1   437        TCGA-A1-A0SD                    0    BRCA -0.54225
2  1321        TCGA-A1-A0SE                    0    BRCA -0.59475
3  1437        TCGA-A1-A0SH                    0    BRCA  0.49975
4   416        TCGA-A1-A0SJ                    0    BRCA -0.58850
5   967        TCGA-A1-A0SK                    1    BRCA -0.96475
6   242        TCGA-A1-A0SM                    0    BRCA  0.57275
# perform surv_categorize in groups determined by dataset
clin_expr %>%
   group_by(dataset) %>%
   do(data_cat = 
         surv_cutpoint(data = ., time = "times",
              event = "patient.vital_status",
              variables = c("PAX8", "dataset"))) ->
   clin_exp_categorized
Error in survival::Surv(time, event): Time variable is not numeric

while this works for not grouped set

surv_cutpoint(data = clin_expr, time = "times",
              event = "patient.vital_status",
              variables = c("PAX8", "dataset")) ->
   clin_exp_categorized
head(surv_categorize(clin_exp_categorized))
  times patient.vital_status PAX8 dataset
1   437                    0  low    BRCA
2  1321                    0  low    BRCA
3  1437                    0  low    BRCA
4   416                    0  low    BRCA
5   967                    1  low    BRCA
6   242                    0  low    BRCA

session info

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C               LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8     LC_MONETARY=pl_PL.UTF-8   
 [6] LC_MESSAGES=pl_PL.UTF-8    LC_PAPER=pl_PL.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dplyr_0.5.0                 purrr_0.2.2                 readr_1.0.0                 tidyr_0.6.0                 tibble_1.2                 
 [6] tidyverse_1.0.0             RTCGA.mRNA_1.2.0            RTCGA.clinical_20151101.4.0 RTCGA_1.5.1                 survival_2.40-1            
[11] survminer_0.2.4             ggplot2_2.2.1               knitr_1.15.1               

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8.4         plyr_1.8.4            viridis_0.3.4         tools_3.3.2           digest_0.6.11         evaluate_0.10        
 [7] gtable_0.2.0          lattice_0.20-34       Matrix_1.2-7.1        DBI_0.5-1             yaml_2.1.14           mvtnorm_1.0-5        
[13] gridExtra_2.2.1       stringr_1.1.0         httr_1.2.1            xml2_1.0.0.9002       exactRankTests_0.8-28 maxstat_0.7-24       
[19] rprojroot_1.1         grid_3.3.2            data.table_1.10.0     R6_2.2.0              XML_3.98-1.5          rmarkdown_1.3        
[25] magrittr_1.5          backports_1.0.4       scales_0.4.1          htmltools_0.3.5       ggthemes_3.3.0        splines_3.3.2        
[31] assertthat_0.1        rvest_0.3.2           colorspace_1.3-2      stringi_1.1.2         lazyeval_0.2.0        munsell_0.4.3
kassambara added a commit that referenced this issue Jan 16, 2017
@kassambara
Copy link
Owner

kassambara commented Jan 16, 2017

This issue is now fixed. The R package maxstat doesn't support very well an object of class tbl_df. Now, in surv_cutpoint() the input data is systematically transformed into a standard data.frame format

# perform surv_categorize in groups determined by dataset
clin_expr %>%
   group_by(dataset) %>%
   do(data_cat = 
         surv_cutpoint(data = ., time = "times",
              event = "patient.vital_status",
              variables = c("PAX8", "dataset"))) ->
   clin_exp_categorized

capture d ecran 2017-01-16 a 21 54 53

kassambara added a commit that referenced this issue Jan 16, 2017
@MarcinKosinski
Copy link
Contributor Author

Thank you @kassambara for your kind help.
You can close the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants