Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_tsv leaks memory when reading from character vectors. #1092

Closed
Shians opened this issue Apr 24, 2020 · 3 comments
Closed

read_tsv leaks memory when reading from character vectors. #1092

Shians opened this issue Apr 24, 2020 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@Shians
Copy link

Shians commented Apr 24, 2020

I'm finding when running read_tsv on character vectors in a loop, the memory usage will keep growing. I don't really have much experience with diagnosing memory issues so here I'm just relying on gc()'s report of memory usage.

dummy <- rep("a\tb\tc\t1\t2\t3\n", 1000000)

for (i in 1:5) {
    readr::read_tsv(dummy, col_names = FALSE)
    print(gc())
}

On my machine (Ubuntu) this grows the memory used by 5mb each run, whereas read.table does not.

for (i in 1:5) {
    read.table(textConnection(dummy), header = FALSE, sep = "\t")
    print(gc())
}
@jimhester jimhester added the bug an unexpected problem or unintended behavior label Apr 24, 2020
@jimhester
Copy link
Collaborator

I can reproduce it, but I don't know the cause offhand.

Note to self, this is something specific to the code in readr, we don't have a memory leak in vroom.

dummy <- rep("a\tb\tc\t1\t2\t3\n", 1000000)

for (i in 1:5) {
    readr::read_tsv(dummy, col_types = list(), col_names = FALSE)
    print(gc())
}
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used (Mb)
#> Ncells  630929 33.7    1335594  71.4         NA  1315632 70.3
#> Vcells 2784978 21.3   13266416 101.3      32768 10417759 79.5
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used (Mb)
#> Ncells  630935 33.7    1335594  71.4         NA  1315632 70.3
#> Vcells 3384037 25.9   13266416 101.3      32768 12024250 91.8
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used (Mb)
#> Ncells  630937 33.7    1335594  71.4         NA  1315632 70.3
#> Vcells 3983037 30.4   13266416 101.3      32768 12622024 96.3
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  630939 33.7    1335594  71.4         NA  1315632  70.3
#> Vcells 4582041 35.0   13266416 101.3      32768 13221036 100.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  630942 33.7    1335594  71.4         NA  1315632  70.3
#> Vcells 5181045 39.6   13266416 101.3      32768 13221036 100.9

for (i in 1:5) {
  vroom::vroom(dummy, col_types = list(), col_names = FALSE)
    print(gc())
}
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  676072 36.2    1335594  71.4         NA  1315632  70.3
#> Vcells 5257711 40.2   13266416 101.3      32768 13221036 100.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  675910 36.1    1335594  71.4         NA  1315632  70.3
#> Vcells 5257464 40.2   13266416 101.3      32768 13221036 100.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  675912 36.1    1335594  71.4         NA  1315632  70.3
#> Vcells 5257484 40.2   13266416 101.3      32768 13221036 100.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  675914 36.1    1335594  71.4         NA  1315632  70.3
#> Vcells 5257508 40.2   13266416 101.3      32768 13221036 100.9
#>           used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells  675916 36.1    1335594  71.4         NA  1315632  70.3
#> Vcells 5257524 40.2   13266416 101.3      32768 13221036 100.9

Created on 2020-04-24 by the reprex package (v0.3.0)

@jimhester
Copy link
Collaborator

Fixed by 9fe20f6

@jimhester
Copy link
Collaborator

Note it is not just character vectors, it is any read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants