Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed boolean and numeric values in a column causes read_excel to crash Rstudio but works fine from command line #758

Open
txemaheredia opened this issue Sep 25, 2024 · 2 comments
Labels
reprex needs a minimal reproducible example

Comments

@txemaheredia
Copy link

I was having a ton of problems with read_excel crashing my Rstudio session. I googled around, tried the suggested methods of re-saving the excel in different tools, formats, or OS. Nothing worked.

Then, I tried to make a silly test in an Rscript to run it from the command line, to see if I got some additional output, and it simply worked flawlessly without crashing. From there, I found out that the culprit is having a mix of boolean and numeric values in the same column. Opening such excel crashes Rstudio, but works from command line giving a few warnings.

Might this help explain previous issues with these crashes?


Brief description of the problem

library(readxl)

# basedir<-"wherever/your/files/are"

file_crash <- file.path(basedir,"repexp_column_mixed.xlsx")
data <- readxl::read_excel(path = file_crash, 
                           sheet = "Sheet1",
                           col_names = F, 
                           trim_ws = T)
warnings()
dim(data)


file_noCrash <- file.path(basedir,"repexp_column_toNumber.xlsx")
data <- readxl::read_excel(path = file_noCrash, 
                           sheet = "Sheet1",
                           col_names = F, 
                           trim_ws = T)
warnings()
dim(data)

Running the "crash" chunk crashes Rstudio and gives this output in command line:

New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
Warning messages:
1: Coercing boolean to numeric in C1 / R1C3
2: Coercing boolean to numeric in C3 / R3C3
3: Coercing boolean to numeric in C5 / R5C3
4: Coercing boolean to numeric in C7 / R7C3
[1] 7 3

Running the "non-crash" chunk works fine in Rstudio and gives this output in command line:

New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
[1] 7 3

sessionInfo()

R version 4.4.0 (2024-04-24)
Platform: x86_64-apple-darwin20
Running under: macOS Monterey 12.7.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readxl_1.4.3

loaded via a namespace (and not attached):
 [1] compiler_4.4.0   magrittr_2.0.3   cli_3.6.3        tools_4.4.0     
 [5] pillar_1.9.0     glue_1.7.0       tibble_3.2.1     utf8_1.2.4      
 [9] fansi_1.0.6      cellranger_1.1.0 vctrs_0.6.5      lifecycle_1.0.4 
[13] pkgconfig_2.0.3  rlang_1.1.4  
@jennybc
Copy link
Member

jennybc commented Feb 12, 2025

Are repexp_column_mixed.xlsx and repexp_column_toNumber.xlsx available somewhere I can access them?


How to provide a readxl reprex

We're in a much better position to address your issue if you can provide a reprex (reproducible example). Provide as much of this as you can:

  • An actual xls or xlsx file. Pick one:
    • Your personal xls or xlsx: try to strip it down to the minimal size and complexity to demonstrate your point. And, obviously, remove any sensitive data.
    • A publicly available xls or xlsx: provide URL and the code you used to download.
  • A small bit of R code that uses readxl on the provided xls or xlsx file and demonstrates your point.
    • Consider using the reprex package to prepare this. In addition to nice formatting, this ensures your reprex is self-contained.
  • Any details about your environment that seem clearly relevant, such as operating system.
    reprex(..., si = TRUE)
    will append a standard summary, folded neatly away, at the bottom of your reprex.

How to provide your own xls/xlsx file? In order of preference:

  1. Attach the file directly to your issue. Instructions are always at the bottom of the issue or comment box. .xlsx is a supported file type. You'll need to zip or gzip .xls so it appears as .zip or .gz.
  2. Share via DropBox or Google Drive and provide the link in your issue.
  3. Explain you absolutely cannot provide a relevant file via github.com and offer to provide privately.
  4. Don't share a file and realize you're hoping for, e.g., a bug fix with no concrete example to go on.

@jennybc jennybc added the reprex needs a minimal reproducible example label Feb 12, 2025
@txemaheredia
Copy link
Author

Hi,

I looked into this issue again, and now, the exact same code no longer crashes Rstudio:

New names:                                                                  
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`

> warnings()

> dim(data_mixed)
[1] 7 3

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-apple-darwin20
Running under: macOS Monterey 12.7.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readxl_1.4.3

loaded via a namespace (and not attached):
 [1] compiler_4.4.0    magrittr_2.0.3    cli_3.6.3         tools_4.4.0      
 [5] pillar_1.10.1     glue_1.8.0        rstudioapi_0.17.1 tibble_3.2.1     
 [9] cellranger_1.1.0  vctrs_0.6.5       lifecycle_1.0.4   pkgconfig_2.0.3  
[13] rlang_1.1.5      
Warning messages:
1: Coercing boolean to numeric in C1 / R1C3 
2: Coercing boolean to numeric in C3 / R3C3 
3: Coercing boolean to numeric in C5 / R5C3 
4: Coercing boolean to numeric in C7 / R7C3 

Maybe the newly added rstudioapi_0.17.1 or upgraded rlang_1.1.5 are responsible for this change in behavior?

Anyway, here are the 2 excel files I used on the original post:

repexp_column_toNumber.xlsx
repexp_column_mixed.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

2 participants