-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data.table crashes R session on rbindlist #2340
Comments
The FAQ,
Does |
Hmm I did not notice this in the FAQ. That being said, it does still crash the session. |
Oh, hadn't noticed your example was in a gist. No repro here on R 3.3.3:
Btw, I guess you are testing on the devel version. If not, see https://github.com/Rdatatable/data.table/wiki/Support |
Interesting that it doesn't seem to reproduce. I don't think I'm using a dev version?
|
Just got caught with this as well. Here's a simpler example... Run it a couple of times to get a segfault (on Windows with 1.10.4 at the moment): require(data.table)
ll <- list(data.table(x=1, y=2), data.table(), data.table(x=3, y=4))
dt <- data.table(bla=1:3, ll)
# run multiple times and you should get a segfault)
dt[, rbindlist(ll, idcol=".id")]
dt[, rbindlist(ll, idcol=".id")]
dt[, rbindlist(ll, idcol=".id")]
dt[, rbindlist(ll, idcol=".id")] |
I'm not sure this is related only to the use of idcol. I'm getting it with some real world data, but I don't have a replicable use case. My setup reads 1000 data.tables from disk (in 1000 rds files), selects some rows and aggregates, and then rbinds the results. (Actually, 40 files are read from disk and the selection and aggregation are applied, with rbindlist called on that result which is stored in a list via lapply. then rbindlist is called again on the resulting list. this is result of that process.) I've verified that all items in the list are of class data.table, and have nonzero number of rows. The list is 30.1 GB as reported by pryr::object_size on a machine with 1.5 TB of RAM
*** caught segfault *** Traceback: (sorry for the delete/repost, wanted to use this account, edit on reading from disk turned out to be inaccurate) |
apologies, after attempting some workarounds, I've discovered that it seems as if a data.table can't have more than MAXINT rows. Didn't realize that was a limitation and is probably what was causing my issue |
I can reproduce using @arunsrinivasan's example with current CRAN (1.10.4-3) on Ubuntu.
Thanks to @mllg's PR #2077 merged in May 2017. @jsams Given the above, I doubt it's related to MAXINT in your case. Can you confirm dev works fine for you? But MAXINT is a good side issue. The |
At the following github is a test data set that should replicate the issue.
On a clean R session, the following seems to completely crash R from an memory violation:
I'm running R version 3.4.1.
The issue seems to be when 'idcol' is used and there are empty data.tables in the list to be rbind'ed. I found the following workaround seems to produce the expected behaviour:
But it would be great if
data.table
handled this.The text was updated successfully, but these errors were encountered: