Key breaks `by` functuanality #1704

DavidArenburg · 2016-05-16T09:56:10Z

I'm not entirely certain what causes this, so here's the most minimal WE I could find

library(data.table) # Tested on v 1.9.7
dt <-  data.table( origin = c("A", "A", "A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "B", "B", "B", "B", "B", "C", "C", "B", "A", "C", "C", "C", "C", "C", "A", "A", "C", "C", "B", "B"),
                   destination = c("A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "A", "A", "B", "B", "B", "C", "C", "B", "B", "A", "B", "C", "C", "C", "A", "A", "C", "C", "B", "B", "C", "C"),
                   points_in_dest = c(5, 5, 5, 5, 4, 4, 5, 5, 3, 3, 5, 5, 4, 4, 4, 3, 3, 4, 4, 5, 4, 3, 3, 3, 5,5, 3, 3, 4, 4, 3, 3),
                   depart_time = c(7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 7, 8, 16, 18, 8, 16, 7, 8, 18, 7, 8, 16, 18, 7, 8, 16, 18),   
                   travel_time = c(0, 0, 0, 0, 70, 10, 70, 10, 10, 10, 70, 70, 0, 0, 0, 70, 10, 10, 70, 70, 10, 0, 0, 0, 10, 70, 10, 70, 10, 70, 70, 10) )

dt[ depart_time<=8  & travel_time < 60, condition1 := TRUE]
dt[ depart_time>=16 & travel_time < 60, condition2 := TRUE] 

setkey(dt, origin, destination)
res <- unique(dt[(condition1)])[unique(dt[(condition2)]), 
                                on = c(destination = "origin", origin = "destination"), 
                                nomatch = 0L]
res[, .(points = sum(points_in_dest)),  keyby = origin]
#    origin points
#1:      A      5
#2:      A      4
#3:      B      4
#4:      B      3
#5:      C      5
#6:      C      4
#7:      C      3

As you can see, by didn't work as intended and all rows were returned. It is obviously a keying problem as the following fixes this

setattr(res, "sorted", NULL)
res[, .(points = sum(points_in_dest)), keyby = origin]
#    origin points
#1:      A      9
#2:      B      7
#3:      C     12

Or, alternatively fore-classing origin to a factor

res[, .(points = sum(points_in_dest)), keyby = factor(origin)]
#    factor points
#1:      A      9
#2:      B      7
#3:      C     12

This was taken from this SO question http://stackoverflow.com/questions/37239649/aggregate-data-table-based-on-condition-in-another-row

The text was updated successfully, but these errors were encountered:

arunsrinivasan · 2016-05-16T14:09:30Z

Very nice example. Will fix. Thanks.

MichaelChirico · 2016-05-16T15:30:22Z

gotta say, that's a creative way to spell functionality!

DavidArenburg · 2016-05-16T15:48:56Z

Fixed....

…'on'.

arunsrinivasan added bug High labels May 16, 2016

arunsrinivasan added this to the v1.9.8 milestone May 16, 2016

arunsrinivasan self-assigned this Jul 21, 2016

arunsrinivasan added a commit that referenced this issue Jul 21, 2016

Closes #1766, #1704. Keys are retained/removed better on joins using …

fd07e7f

…'on'.

arunsrinivasan closed this as completed Jul 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key breaks `by` functuanality #1704

Key breaks `by` functuanality #1704

DavidArenburg commented May 16, 2016 •

edited

Loading

arunsrinivasan commented May 16, 2016

MichaelChirico commented May 16, 2016

DavidArenburg commented May 16, 2016

Key breaks by functuanality #1704

Key breaks by functuanality #1704

Comments

DavidArenburg commented May 16, 2016 • edited Loading

arunsrinivasan commented May 16, 2016

MichaelChirico commented May 16, 2016

DavidArenburg commented May 16, 2016

Key breaks `by` functuanality #1704

Key breaks `by` functuanality #1704

DavidArenburg commented May 16, 2016 •

edited

Loading