Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] lagging of lists with shift() #1595

Closed
enfascination opened this issue Mar 17, 2016 · 11 comments
Closed

[Request] lagging of lists with shift() #1595

enfascination opened this issue Mar 17, 2016 · 11 comments
Assignees
Milestone

Comments

@enfascination
Copy link

The way shift() works on lists, it can't be used to created lagged columns of type list the same way that it can create lagged columns of other types. Example:

dt <- data.table(mtcars)[,.(gear, carb, cyl)]
###  Make col of lists
dt[,carbList:=list(list(unique(carb))), by=.(cyl, gear)]
###  Now I want to lag/lead col of lists
dt[,.(carb, carbLag=shift(carb)
    , carbList, carbListLag=shift(carbList, type="lead")), by=cyl] 

    cyl carb carbLag carbList carbListLag
...
19:   8    2      NA     2,4,3      4, 3,NA
20:   8    4       2     2,4,3      4, 3,NA
21:   8    3       4     2,4,3      4, 3,NA

from SO

It would be useful if shift() treated lists the same. The current behavior is technically documented — ?shift says how it works on lists — but the docs describe that behavior outside the context of data.table, and the implication for within-data.table behavior isn't clear into you stumble into it. In the context of data.table, where it functions as a really nice way to lag/lead columns by group, shift's behavior with lists seems inconsistent with how it works on other types.

General use case: Being able to compare a set to itself over time
Example specific use case: Say I have a table of all of the people who have come to my birthday each year, one row per name per year. If I wanted to see how many people came this year that didn't come last year, and how many came last year that didn't come this year, and the people who come every year, and all the people who have ever come, I'd be able to do it all at once with something like this pseudocode:

bday[,.(friendL=list(friend)),by=year]
        [order(year),{friendLlag:=shift(friendL); 
                             newfriends=length(setdiff(friendL, friendLlag)); 
                             badfriends=length(setdiff(friendLlag, friendL)); 
                             goodfriends=length(intersect(friendL, friendLlag)); 
                             allfriends=length(union(friendL, friendLlag)); 
                             list(allthosenumbers)
         }]
@arunsrinivasan
Copy link
Member

MRE with expected output please. Hard to follow. But I'm not sure what's unclear in the documentation. The Details section explains how .SD is handled. .SD is a data.table which is also a list. And examples cover both .SD and lists.

Perhaps your use case is more relevant with list-of-lists? It's not a common usage for shift. Hence not supported at the moment.

Since I don't understand what / why this is needed, I'm reluctant to add support for list-of-list types. A MRE would help. Feel free to reopen after.

@franknarf1
Copy link
Contributor

@arunsrinivasan OP shows their desired output on SO (not sure why they didn't link it): http://stackoverflow.com/a/36041367/1191259

If I understand correctly, the short version is:

# example data
DT = data.table(mtcars)[,carbList:= .(list(unique(carb))), by=.(cyl, gear)]

# desired output something like
DT[, .(c(list(NA), carbList[-.N])), by=cyl]

# desired syntax to create this output
DT[, .(shift(carbList, 1)), by=cyl]

Personally, I don't really need it. I try not to do anything fancy with list columns.

@arunsrinivasan
Copy link
Member

Thanks @franknarf1. MRE is:

require(data.table)
dt = data.table(x=1:2, y=list(3:4, 5:6))
dt[, z := shift(y)] # op expects dt[, z := list(list(NA, 3:4))]

Is that right?

@franknarf1
Copy link
Contributor

Yep, that's my understanding.

@arunsrinivasan
Copy link
Member

shift() is limited to atomic vectors. If a list is provided, it'll operate on its elements as long as they are atomic vectors. Not quite sure if we should handle this.

@arunsrinivasan
Copy link
Member

How about:

dt[, z := y[shift(.I)]]
#    x   y    z
# 1: 1 3,4 NULL
# 2: 2 5,6  3,4

Seems to work fine with groups as well.

dt=data.table(x=c(1,1,2), y=list(1:2,3:4,5:6))
dt[, z := .(y[shift(.I)]), by=x]
#    x   y    z
# 1: 1 1,2 NULL
# 2: 1 3,4  1,2
# 3: 2 5,6 NULL

@franknarf1
Copy link
Contributor

Yeah, that's a good idiom, I think.

I think the documentation is fine; probably no need to tag this FR with it.

@arunsrinivasan
Copy link
Member

I think the doc is quite clear. And this is not an intended use case for shift. The workaround seems sufficient. Will revisit if there are more requests.

@arunsrinivasan
Copy link
Member

Added list-of-list support.

@statquant
Copy link

statquant commented Oct 1, 2016

Actually... does this work ?

dt=data.table(x=c(1,1,2,2,3), y=list(1:2,3:4,5:6,7:8,9:10))
dt[, z := .(y[shift(.I)]), by=x]
   x     y    z
1: 1   1,2 NULL
2: 1   3,4  1,2
3: 2   5,6 NULL
4: 2   7,8 NULL <=== why NULL ??
5: 3  9,10 NULL

edit: this seems to work then

dt=data.table(x=c(1,1,2,2,3), y=list(1:2,3:4,5:6,7:8,9:10))
dt[,z:=shift(.I),x][,v:=.(y[z])]
dt
   x     y  z    v
1: 1   1,2 NA NULL
2: 1   3,4  1  1,2
3: 2   5,6 NA NULL
4: 2   7,8  3  5,6
5: 3  9,10 NA NULL

@MichaelChirico
Copy link
Member

.I is not populated within by groups. so it's looking for y[3] which
it can't find within the x = 2 group

On Oct 1, 2016 11:07 AM, "statquant" notifications@github.com wrote:

Actually... does this work ?

dt=data.table(x=c(1,1,2,2,3), y=list(1:2,3:4,5:6,7:8,9:10))
dt[, z := .(y[shift(.I)]), by=x]
x y z
1: 1 1,2 NULL
2: 1 3,4 1,2
3: 2 5,6 NULL
4: 2 7,8 NULL <=== why NULL ??
5: 3 9,10 NULL


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1595 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHQQdW5l60Csi9t6XRiTSrnQW_WnwQf-ks5qvnc3gaJpZM4HzDt2
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants