-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
by = .EACHI key fix 4603 #4917
by = .EACHI key fix 4603 #4917
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4917 +/- ##
=======================================
Coverage 99.42% 99.42%
=======================================
Files 73 73
Lines 14429 14430 +1
=======================================
+ Hits 14346 14347 +1
Misses 83 83
Continue to review full report at Codecov.
|
…key_cols. Resolves both new tests and should pass now.
@@ -1385,7 +1385,8 @@ replace_dot_alias = function(e) { | |||
byval = i | |||
bynames = if (missing(on)) head(key(x),length(leftcols)) else names(on) | |||
allbyvars = NULL | |||
bysameorder = haskey(i) || (is.sorted(f__) && ((roll == FALSE) || length(f__) == 1L)) # Fix for #1010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the length(f__)==1L
too. That was there, iiuc, from before #3443 when is.sorted()
returned false for a single NA. Since is.sorted()
is true for all length 1 input, the || length(f__)==1L
is superfluous now.
Also moved the roll=FALSE
first before && is.sorted(f__)
to save the call to is.sorted()
when roll==TRUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chmatch
not match
too. i can fix if you're AFK now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good spot. done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I thought we had a isLeadingSubsetOfKey()
helper function, or similar name, as we do that op quite a bit. But when I did a few greps I couldn't find anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was exactly thinking that we should just have a helper for this by now...
R/data.table.R
Outdated
@@ -1385,7 +1385,8 @@ replace_dot_alias = function(e) { | |||
byval = i | |||
bynames = if (missing(on)) head(key(x),length(leftcols)) else names(on) | |||
allbyvars = NULL | |||
bysameorder = haskey(i) || (is.sorted(f__) && ((roll == FALSE) || length(f__) == 1L)) # Fix for #1010 | |||
bysameorder = (haskey(i) && identical(leftcols, head(match(key(i),names(i)), length(leftcols)))) || # leftcols leading subset of key(i); see #4917 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for identical() over all( . == . ) here? identical is a heavier test including things like attributes right?
also at a glance I would do head() before match()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. ==
is an alloc returning logical vector which all()
then passes over. It's the alloc that's nice to avoid; usually fast but could trigger a gc. identical
is a binary compare so, when there aren't any attributes to check, it's faster than all(.==.)
, iirc some tests from years back posted somewhere where identical
was fastest. But could be wrong and would be surprised if there's much in it in this small case. Then there's call overhead of .Internal
vs .Primitive
vs S3-method of the various options (identical
vs all
vs all.equal
).
Yep head
before match
good idea. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caveat ... ==
for a length 1 vector is a special case that doesn't allocate, just for the reason that R has global internal constants for TRUE, FALSE, and NA_LOGICAL. So the result of x==1L
when x is length 1L will be one of those global pointers without an alloc. As always, IIUC.
Closes #4603.
Closes #4911.
During
by = .EACHI
, the sorted attribute could be incorrectly assigned. Because theon = .(lhs = rhs)
, the order of the sorted attributes may not match the order of the by columns.A second issue is that during normal
by =
operations, each column is keyed. However, since not all columns inY
ofX[Y]
are not guaranteed to be keyed, we cannot assume that all the columns are keyed, per existing behavior:data.table/R/data.table.R
Line 1835 in 263b53e