-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] fwrite in large environments (tables up to 100M rows) #1968
Comments
I have the same problem, reproducible under conditions where multiple threads are used and for numeric values only (integers seem to work fine) as @mgahan wrote. It interestingly appears to be isolated to instances where there's > 1 column in the data table:
|
@mattdowle I just tested it out and my tests check out. Thanks for all the hard work! |
@mgahan Excellent - thanks! |
@mattdowle @mgahan which version of the package has this bug fix? |
@mattdowle I am not sure that this issue is resolved. Seeing it in DT v 10.4.0: Browse[1]> nrow(dt[population<1]) Browse[1]> fwrite(dt, file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> newdt <- fread(file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> nrow(newdt[population<1]) Browse[1]> fwrite(dt, file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> newdt <- fread(file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> nrow(newdt[population<1]) Browse[1]> fwrite(dt, file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> newdt <- fread(file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> nrow(newdt[population<1]) Browse[1]> fwrite(dt, file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> newdt <- fread(file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> nrow(newdt[population<1]) Browse[1]> sessionInfo() locale: attached base packages: other attached packages: loaded via a namespace (and not attached): |
Resolution is not on CRAN. Install the current development version (1.10.5) |
@MichaelChirico I don't think so. Just installed dev version, same behavior: Browse[1]> newdt <- fread(file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> fwrite(dt, file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> newdt <- fread(file.path(worker.dir, paste0(country, '.csv'))) Browse[1]> nrow(dt[population<1]) Browse[1]> nrow(newdt[population<1]) Browse[1]> sessionInfo() locale: attached base packages: other attached packages: loaded via a namespace (and not attached): |
Potentially could be specific to our cluster environment but others at my institute are seeing this same issue. Seems to be related to columns being accidentally reordered inside a row. I would recommend that people be very cautious using fwrite in production code at this stage, this bug seems to be pervasive and is really difficult to track down in large outputs. |
I have been using
fwrite
in a large AWS environment detailed below:Instance: m4.4xlarge
RAM: 64gb
Threads: 16
Cost: $0.862 hourly
I notice that when writing out numeric values, the written output is incorrect. When the same data is coerced to the integer class, the output seems to be correct. This does not seem to be a problem with
setDTthreads(1)
. However, errors start to creep in whensetDTthreads(2)
, albeit less errors thansetDTthreads(16)
. I have detailed the two scenarios below.Writing numeric output with fwrite
Writing integer output with fwrite
The text was updated successfully, but these errors were encountered: