-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setNames
causes error when used with groupby (the by
parameter)
#4963
Comments
This should work: > dt[, .(agg = list(setNames(value, id))), grp]
grp agg
<char> <list>
1: a 4,5
2: b 6 You are getting the error because when you wrap It's been a long time since I used > dt %>% group_by(grp) %>% summarise(agg = list(setNames(as.list(value), id))) %>% data.table %>% str
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ grp: chr "a" "b"
$ agg:List of 2
..$ :List of 2
.. ..$ 1: int 4
.. ..$ 2: int 5
..$ :List of 1
.. ..$ 3: int 6
- attr(*, ".internal.selfref")=<externalptr>
> dt[, .(agg = list(setNames(value, id))), grp] %>% str
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ grp: chr "a" "b"
$ agg:List of 2
..$ : Named int 4 5
.. ..- attr(*, "names")= chr [1:2] "1" "2"
..$ : Named int 6
.. ..- attr(*, "names")= chr "3"
- attr(*, ".internal.selfref")=<externalptr> My best guess is that |
hi @avimallu thanks for your reply. You are right that I don't think 'names' attribute [58] must be the same length as the vector [40] where 58 is the size of some groups and 40 is the size of some other group. As an example, try the following. > dt = data.table(id=c(1:6), grp=c('a', 'a', 'b', 'a', 'b', 'c'), value=c(4:9))
> dt[, by = grp, .(agg = list(setNames(as.list(value), id)))]
Error in lapply(x, runlock, current_depth = current_depth + 1L) :
'names' attribute [3] must be the same length as the vector [2] |
Could you let me know why you want it as a > dt[, by = grp, .(agg = list(list(setNames(value, id))))]
grp agg
1: a <list[1]>
2: b <list[1]>
> str(dt[, by = grp, .(agg = list(list(setNames(value, id))))])
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ grp: chr "a" "b"
$ agg:List of 2
..$ :List of 1
.. ..$ : Named int 4 5
.. .. ..- attr(*, "names")= chr [1:2] "1" "2"
..$ :List of 1
.. ..$ : Named int 6
.. .. ..- attr(*, "names")= chr "3"
- attr(*, ".internal.selfref")=<externalptr> Reference. I'm looking deeper into your other comment, but don't have an answer right now. |
No that's still different from my original data structure. # This gives a list of one list of many values for each row in `agg` column.
> dt[, by = grp, .(agg = list(setNames(as.list(value), id)))]
# This gives a list of one list of one vector of many values for each row in `agg` column.
> dt[, by = grp, .(agg = list(list(setNames(value, id))))]
# This should give the same result as intended but also gives error
> dt[, by = grp, .(agg = list(as.list(setNames(value, id))))] To be frank I'm doing this just because of legacy reason. Someone else wrote the code making the data structure this way, and I have to be consistent. However, I think there may be a bigger issue here in the implementation how the groups of data is handled.
> dt = data.table(id=c(1:6), grp=c('a', 'a', 'b', 'a', 'b', 'c'), value=c(4:9))
> dt[, by = grp, .(agg = list(list(setNames(value, id))))] # this works
grp agg
1: a <list[1]>
2: b <list[1]>
3: c <list[1]>
> dt[, by = grp, .(agg = list(as.list(setNames(value, id))))] # this doesn't work
Error in lapply(x, runlock, current_depth = current_depth + 1L) :
'names' attribute [3] must be the same length as the vector [2] |
Is there any difference in the R version between the two versions of |
In addition, > library(data.table)
> dt = data.table(id=c(1:6), grp=c('a', 'a', 'b', 'a', 'b', 'c'), value=c(4:9))
> as.list(setNames(dt$value, dt$id))
$`1`
[1] 4
$`2`
[1] 5
$`3`
[1] 6
$`4`
[1] 7
$`5`
[1] 8
$`6`
[1] 9
> list(setNames(dt$value, dt$id))
[[1]]
1 2 3 4 5 6
4 5 6 7 8 9 I realize I still haven't answered all of your questions - this is just to point out that your expectation that |
yes I'm using the same version of R in both cases, and that is R4.0.3. On the discussion of |
TL,DR: This is a version of an old bug where list columns returned by an aggregation would retain a pointer to the last group. #4655 rewrote how data.table handles an old bug with list columns which would retain a pointer to the last group in an aggregation. It missed this case where the elements of the list get their attributes set to values from another column. Thanks for reporting, this looks like an actual bug in .Internal(inspect(ans))
# @0x00000029e6ff6c38 19 VECSXP g1c2 [MARK,REF(11)] (len=2, tl=0)
# @0x00000029e6ff6bf8 16 STRSXP g1c2 [MARK,REF(3)] (len=2, tl=0)
# @0x00000029d8cb9f70 09 CHARSXP g1c1 [MARK,REF(60),gp=0x61] [ASCII] [cached] "a"
# @0x00000029d9088938 09 CHARSXP g1c1 [MARK,REF(65),gp=0x61] [ASCII] [cached] "b"
# @0x00000029e6ff6bb8 19 VECSXP g1c2 [MARK,REF(5)] (len=2, tl=0)
# @0x00000029e6ff6c78 19 VECSXP g1c2 [MARK,REF(6),ATT] (len=2, tl=0)
# @0x00000029e2811908 13 INTSXP g1c1 [MARK,REF(3)] (len=1, tl=0) 4
# @0x00000029e28118d0 13 INTSXP g1c1 [MARK,REF(3)] (len=1, tl=0) 5
# ATTRIB:
# @0x00000029e1273720 02 LISTSXP g1c0 [MARK,REF(1)]
# TAG: @0x00000029d20016a0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "names" (has value)
# @0x00000029e1273758 16 STRSXP g1c0 [MARK,REF(65535)] <expanded string conversion>
# @0x00000029e7000e18 16 STRSXP g1c2 [MARK,REF(1)] (len=2, tl=0)
# @0x00000029e10a4a70 09 CHARSXP g1c1 [MARK,REF(299),gp=0x60] [ASCII] [cached] "3"
# @0x00000029d8effb70 09 CHARSXP g1c1 [MARK,REF(368),gp=0x60] [ASCII] [cached] "2"
# @0x00000029e2811748 19 VECSXP g1c1 [MARK,REF(8),ATT] (len=1, tl=0)
# @0x00000029e2811710 13 INTSXP g1c1 [MARK,REF(1)] (len=1, tl=0) 6
# ATTRIB:
# @0x00000029e1272df0 02 LISTSXP g1c0 [MARK,REF(1)]
# TAG: @0x00000029d20016a0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "names" (has value)
# @0x00000029e1272e28 16 STRSXP g1c0 [MARK,REF(65535)] <deferred string conversion>
# @0x00000029e280f4e0 13 INTSXP g1c1 [MARK,REF(65535)] (len=2, tl=2) 3,2 Clearly, the names attribute of the first list item should be git bisect shows that the bug was introduced in #4655. The bug does not occur if we drop the I've also attempted to replace |
Hi
data.table
team, this is the first time I'm reporting an issue to this repo. I have tried to search in release docs, other issues, stackoverflow, and have not found anything similar reported yet. If that's not true then I apologise for the trouble. I did not try this on latest dev version.#
[Minimal reproducible example
]In V1.14.0, the following gives an error. However, the exact same code in V1.12.8 works fine with valid output.
Also, not sure if this is relevant but
dplyr
gives the same output in both versions ofdata.table
, although with slightly different formatting, which I assume is merely a change of howdata.table
print out thelist
type in those two versions.#
Output of sessionInfo()
The text was updated successfully, but these errors were encountered: