-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All non-atomic types are now converted to lists in rbindlist #4196
base: master
Are you sure you want to change the base?
Changes from 1 commit
0e67524
5268888
79e6ed2
b647ebe
d44129f
1908ecc
3e34323
a7ae6db
2bae80c
7166509
7063d5c
5f6543a
070508c
49fd930
2e26e9b
10baf70
e2f3b65
e0d2f46
60dddd1
9e94df6
00c6d85
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -514,18 +514,33 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg) | |
if (w==-1 || !length(thisCol=VECTOR_ELT(li, w))) { // !length for zeroCol warning above; #1871 | ||
writeNA(target, ansloc, thisnrow); // writeNA is integer64 aware and writes INT64_MIN | ||
} else { | ||
if ((TYPEOF(target)==VECSXP) && TYPEOF(thisCol)>TYPEOF(target)) { | ||
// Exotic non-atomic types need each element to be wrapped in a list, e.g. expression vectors #546 | ||
if ((TYPEOF(target)==VECSXP) && (isVectorAtomic(thisCol) || TYPEOF(thisCol)==LISTSXP)) { | ||
// do an as.list() on the atomic column; #3528 | ||
// pairlists (LISTSXP) can also be coerced to lists using coerceVector | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it coerce recursively unwrapping nested parilists into single list of many elements? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only the top level pairlist is coerced to a list:
|
||
thisCol = PROTECT(coerceVector(thisCol, TYPEOF(target))); nprotect++; | ||
} else if ((TYPEOF(target)==VECSXP) && TYPEOF(thisCol)==EXPRSXP) { | ||
// For EXPRSXP each element to be wrapped in a list, e.g. expression vectors #546 | ||
SEXP thisColList = PROTECT(allocVector(VECSXP, length(thisCol))); nprotect++; | ||
for(int r=0; r<length(thisCol); ++r) { | ||
SEXP thisElement = VECTOR_ELT(thisCol, r); | ||
if (TYPEOF(thisCol) == EXPRSXP) thisElement = PROTECT(coerceVector(thisElement, EXPRSXP)); nprotect++; // otherwise LANGSXP | ||
thisElement = PROTECT(coerceVector(thisElement, EXPRSXP)); nprotect++; // otherwise LANGSXP | ||
SET_VECTOR_ELT(thisColList, r, thisElement); | ||
} | ||
thisCol = thisColList; | ||
} else if ((TYPEOF(target)==VECSXP) && TYPEOF(thisCol)<TYPEOF(target)) { | ||
// do an as.list() on the atomic column; #3528 | ||
thisCol = PROTECT(coerceVector(thisCol, TYPEOF(target))); nprotect++; | ||
} else if ((TYPEOF(target)==VECSXP) && !isVector(thisCol) && TYPEOF(thisCol)!=TYPEOF(target)) { | ||
// Anything not a vector we can assign directly through SET_VECTOR_ELT | ||
// Although tecnically there should only be one list element for any type met here, | ||
// the length of the type may be > 1, in which case the other columns in data.table | ||
// will have been recycled. We therefore in turn have to recycle the list elements | ||
// to match the number of rows. | ||
SEXP thisColList = PROTECT(allocVector(VECSXP, length(thisCol))); nprotect++; | ||
for(int r=0; r<length(thisCol); ++r) { | ||
SET_VECTOR_ELT(thisColList, r, thisCol); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Demonstration of the recycling issue and why we need this for loop:
The for loop means we get the right result with rbind:
Otherwise we would have
It would be good to fix this in data.table()/as.data.table() so that these non-vector types are wrapped in list at the time of data.table construction There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agree, very nice demo |
||
thisCol = thisColList; | ||
} else if ((TYPEOF(target)==VECSXP) && TYPEOF(thisCol)!=TYPEOF(target)) { | ||
// should be unreachable | ||
error("Internal error: rbindlist cannot handle type %s\n", type2char(TYPEOF(thisCol))); // # nocov | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Logic abstracted into a separate function in utils.c because it is/will be otherwise duplicated in Casmatrix in #4144. I also envision in a future PR adding an R exposable wrapper so that non-atomic vector columns may be caught and coerced to list in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if a single PROTECT is sufficient here - the PROTECT stack in coerceAsList may be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The method i've settled on after hitting the protect stack limit in testing #4144 is to have the C function called from R manage the PROTECT stack, and for functions it calls to simply increment a protect counter. I've modified |
||
// else coerces if needed within memrecycle; with a no-alloc direct coerce from 1.12.4 (PR #3909) | ||
const char *ret = memrecycle(target, R_NilValue, ansloc, thisnrow, thisCol, 0, -1, idcol+j+1, foundName); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the desired behaviour here is for a
pairlist
to be converted to alist
, rather than being converted to a list where each element is a 1-element pairlist.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of both would help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense - my understanding is a pairlist is a linked-list, while a regular list is essentially a vector of pointers to the list elements. If we treat a LISTSXP the same way as an EXPRSXP, then you end up with a list, where each element is a pointer to each individual node in the linked list. You end up lose any benefits of having the linked list, because each list element is just a pointer to a 1-node linked list.
E.g the result is something like: