Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks when reading large gzip files #1161

Closed
bigey opened this issue Dec 11, 2020 · 2 comments
Closed

Memory leaks when reading large gzip files #1161

bigey opened this issue Dec 11, 2020 · 2 comments

Comments

@bigey
Copy link

bigey commented Dec 11, 2020

Iteratively reading large files (* .tsv.gz) increases the memory footprint until system freezes. If I change to read.table function, there is no problem. An exemple below using a test file:

test.tsv.gz

library(readr)
file = "test.tsv.gz"

while(TRUE) {
  # Memory leacks using
  read_tsv(file = file, col_names = c("Chr","Pos","Cov"))
  
  # No problem using 
  # read.table(file, col.names = c("Chr","Pos","Cov"))
}
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_1.4.0

loaded via a namespace (and not attached):
 [1] fansi_0.4.1      assertthat_0.2.1 crayon_1.3.4     dplyr_1.0.2      R6_2.4.1         lifecycle_0.2.0  magrittr_1.5    
 [8] pillar_1.4.6     cli_2.1.0        rlang_0.4.8      rstudioapi_0.11  vctrs_0.3.4      generics_0.0.2   ellipsis_0.3.1  
[15] tools_4.0.3      glue_1.4.2       purrr_0.3.4      hms_0.5.3        compiler_4.0.3   pkgconfig_2.0.3  tidyselect_1.1.0
[22] tibble_3.0.4
@jimhester jimhester changed the title Memory leacks when reading large gzip files Memory leaks when reading large gzip files Apr 13, 2021
@jimhester
Copy link
Collaborator

I can confirm the memory leak, it does not happen with a normal file path, but does happen when reading from any connection, including gzfile() connections.

@boshek
Copy link
Contributor

boshek commented Apr 14, 2021

@jimhester thank you for this. This is a huge help. is there any tentative release date for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants