-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] allow tmpDir to be supplied as argument: fread can run out of tmpfs space on unix during preprocessing #1139
Comments
I've also run into this issue. What is particularly frustrating is that if /dev/shm exists, then the value of TMPDIR is ignored and /dev/shm is used. It would be great if TMPDIR were respected. |
Agreed. This would really help my workflow. |
+1 It would save my life! |
It's not good that the |
Something like
|
Hi,
Recently I've encountered an issue for large compressed files that could stop the functioning of
fread
due to tmpfs out off space. Since currently (in the master branch)fread
on unix system will use tmpfs (/dev/shm) as long as it exists, the size of tmpfs will limit the capability offread
to read potentially large files before any preprocessing can be done. This is more severe when multi-threading is used to simultaneously load multiple files for speed gain, say,mclapply(input_list, fread, mc.cores=4)
, where input list may be something likeEach gz file could have several GBs uncompressed. I don't need them all in my analysis and a preprocessing could be done to significantly reduce the size of each file. However, the preprocessing requires each file to be uncompressed to disk in the first place, occupying all the space available in tmpfs. (There are, of course, several work-a-rounds for this kind of situation but it could be great to directly address it in one R function call, which is
fread
in discuss.)It hence could be nice if a user-input argument is allowed to force
tempfile
location other than tmpfs on unix system. For exampledat <- fread("zcat file.gz", tmpDir="/data")
. The performance may be a bit worse due to disk I/O but the raw data will not be limited by size of tmpfs, which is usually by far smaller than any disk device at hand. (On my machine I have 8 GBs in tmpfs and that's it.)A possible minor change to make this issue fixed on unix is to rewrite
fread.R
as everdark@4aaa745.I only test it on my local machine and it works fine. There could be some ramification that I don't take into account in this simple modification so I create this request issue to open the discussion. :) Did anybody else also encounter such tmpfs out-of-space issue?
The text was updated successfully, but these errors were encountered: