-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_tsv leaks memory when reading from character vectors. #1092
Labels
bug
an unexpected problem or unintended behavior
Comments
I can reproduce it, but I don't know the cause offhand. Note to self, this is something specific to the code in readr, we don't have a memory leak in vroom. dummy <- rep("a\tb\tc\t1\t2\t3\n", 1000000)
for (i in 1:5) {
readr::read_tsv(dummy, col_types = list(), col_names = FALSE)
print(gc())
}
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 630929 33.7 1335594 71.4 NA 1315632 70.3
#> Vcells 2784978 21.3 13266416 101.3 32768 10417759 79.5
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 630935 33.7 1335594 71.4 NA 1315632 70.3
#> Vcells 3384037 25.9 13266416 101.3 32768 12024250 91.8
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 630937 33.7 1335594 71.4 NA 1315632 70.3
#> Vcells 3983037 30.4 13266416 101.3 32768 12622024 96.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 630939 33.7 1335594 71.4 NA 1315632 70.3
#> Vcells 4582041 35.0 13266416 101.3 32768 13221036 100.9
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 630942 33.7 1335594 71.4 NA 1315632 70.3
#> Vcells 5181045 39.6 13266416 101.3 32768 13221036 100.9
for (i in 1:5) {
vroom::vroom(dummy, col_types = list(), col_names = FALSE)
print(gc())
}
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 676072 36.2 1335594 71.4 NA 1315632 70.3
#> Vcells 5257711 40.2 13266416 101.3 32768 13221036 100.9
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 675910 36.1 1335594 71.4 NA 1315632 70.3
#> Vcells 5257464 40.2 13266416 101.3 32768 13221036 100.9
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 675912 36.1 1335594 71.4 NA 1315632 70.3
#> Vcells 5257484 40.2 13266416 101.3 32768 13221036 100.9
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 675914 36.1 1335594 71.4 NA 1315632 70.3
#> Vcells 5257508 40.2 13266416 101.3 32768 13221036 100.9
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 675916 36.1 1335594 71.4 NA 1315632 70.3
#> Vcells 5257524 40.2 13266416 101.3 32768 13221036 100.9 Created on 2020-04-24 by the reprex package (v0.3.0) |
Fixed by 9fe20f6 |
Note it is not just character vectors, it is any read. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm finding when running
read_tsv
on character vectors in a loop, the memory usage will keep growing. I don't really have much experience with diagnosing memory issues so here I'm just relying on gc()'s report of memory usage.On my machine (Ubuntu) this grows the memory used by 5mb each run, whereas
read.table
does not.The text was updated successfully, but these errors were encountered: