-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove deep copy of indices from shallow() #4440
Changes from all commits
0f0e712
bc155fa
be699cf
2d4dac1
332f6b7
1d3cea1
1334aa7
801a822
40f9165
959b676
e87c5f0
769f02c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -152,13 +152,25 @@ static SEXP shallow(SEXP dt, SEXP cols, R_len_t n) | |
R_len_t i,l; | ||
int protecti=0; | ||
SEXP newdt = PROTECT(allocVector(VECSXP, n)); protecti++; // to do, use growVector here? | ||
//copyMostAttrib(dt, newdt); // including class | ||
DUPLICATE_ATTRIB(newdt, dt); | ||
SET_ATTRIB(newdt, shallow_duplicate(ATTRIB(dt))); | ||
SET_OBJECT(newdt, OBJECT(dt)); | ||
IS_S4_OBJECT(dt) ? SET_S4_OBJECT(newdt) : UNSET_S4_OBJECT(newdt); // To support S4 objects that incude data.table | ||
//SHALLOW_DUPLICATE_ATTRIB(newdt, dt); // SHALLOW_DUPLICATE_ATTRIB would be a bit neater but is only available from R 3.3.0 | ||
|
||
// TO DO: keepattr() would be faster, but can't because shallow isn't merely a shallow copy. It | ||
// also increases truelength. Perhaps make that distinction, then, and split out, but marked | ||
// so that the next change knows to duplicate. | ||
// Does copyMostAttrib duplicate each attrib or does it point? It seems to point, hence DUPLICATE_ATTRIB | ||
// for now otherwise example(merge.data.table) fails (since attr(d4,"sorted") gets written by setnames). | ||
// keepattr() also merely points to the entire attrbutes list and thus doesn't allow replacing | ||
// some of its elements. | ||
|
||
// We copy all attributes that refer to column names so that calling setnames on either | ||
// the original or the shallow copy doesn't break anything. | ||
SEXP index = PROTECT(getAttrib(dt, sym_index)); protecti++; | ||
setAttrib(newdt, sym_index, shallow_duplicate(index)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at https://github.com/wch/r-source/blob/trunk/src/main/duplicate.c that doesn't seem to be an issue. It starts out exactly like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made an issue that could help to address this overhead: #4467 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks Matt. Good spot. Sorry for confusion. Indeed copy of |
||
|
||
SEXP sorted = PROTECT(getAttrib(dt, sym_sorted)); protecti++; | ||
setAttrib(newdt, sym_sorted, duplicate(sorted)); | ||
MichaelChirico marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
SEXP names = PROTECT(getAttrib(dt, R_NamesSymbol)); protecti++; | ||
SEXP newnames = PROTECT(allocVector(STRSXP, n)); protecti++; | ||
if (isNull(cols)) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protect AFAIK is not needed.
dt
would need to be garbage collected before we will assign index as an attribute ofnewdt
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right but I wasn't quite sure so decided to follow the pattern of just a few lines later and take the cautious approach. Looking through the rest of the file, there are a few instances where the call to
getAttrib
is notPROTECT
ed and many where it is. I also don't think thatdt
could be garbage collected.Anyway, I wouldn't mind taking it out but I don't think it hurts and then it begs the question of what to do further down in the function or in the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I spotted in the code there were protects sometimes where it was not needed. It is not really an issue, but it spreads the pattern of overprotecting, as you see on yourself. I used to do it as well. Still, I might be wrong, but this will be verified when preparing release for CRAN and running strict memory tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the PROTECT statements and also went ahead and removed the one on what is now line 174 for consistency. I considered just going through the entire file while I was at it, but that seemed out of scope for the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thoughts and logic as you both. But
rchk
revealed thatgetAttrib
can sometimes allocate inside it, perhaps more so now with ALTREP. Search for "A common source of true errors is a failure to protect the result of getAttrib when retrieving an attribute that may be automatically generated/converted (e.g. names, dimnames)" in https://github.com/kalibera/rchk/blob/master/doc/USAGE.md. Also search https://github.com/kalibera/rchk/blob/master/doc/INTERNALS.md forgetAttrib
.I suspect in data.table's usage of the R API, we will never see
getAttrib
allocate, but R API is more general. So to passrchk
(as per steps in CRAN_Release.cmd, and as required by CRAN under additional checks) we have to protectgetAttrib
calls.The variance in why some
getAttrib
calls are not protected in data.table, may be thatrchk
knows some cases ofgetAttrib
do not need to be protected depending on what the 2nd argument is.In the past I've gone through and protected any
getAttrib
calls thatrchk
spots until it passes.Removing over-zealous protection, as long as
rchk
still passes, is worthwhile for speed and simpler code I agree. Perhapsrchk
could be added to GLCI.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting, thanks. I don't mind reverting those changes at all. I just have two thoughts:
PROTECT
may be related to garbage collection only being triggered by allocating calls as not all instances ofgetAttrib(*, R_NamesSymbol)
are currently protected.R_NamesSymbol
when the first argument is a pairlist or a language object (butrchk
can't check the first argument)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getAttrib(*, R_NamesSymbol)
are protected, for example inrbindlist.c
. It could be that those will be picked up when I rerunrchk
before release (perhaps those lines were added or changed in dev since 1.12.8), or more likely, from prior discussions with Thomas I suspect thatrchk
looks afterwards to the usage of the unprotectedgetAttrib
. If there is no possible GC between the getAttrib and its last usage, or if the result is protected by dint of being passed as the value tosetAttrib
, thenrchk
is clever enough (I guess) to not raise the unprotectedgetAttrib
. It does an extreme amount of tracking to spot unbalanced protection, for example, all statically (by looking at the source code without using runtime tests) and I've learnt not to underestimate how advanced it is.STRING_ELT
can allocate if the vector it is passed is an ALTREP.