uniqueN could supports list input #1224

jangorecki · 2015-07-15T10:22:59Z

And not just lists but any type which is not supported by uniqueN could simply redirects to length(unique(.)).

library(data.table)
d1 <- data.table(a = 1:4, l = list(list(letters[1:2]),list(Sys.time()),list(1:10),list(letters[1:2])))
d1[,length(unique(l))]
# [1] 3
d1[,uniqueN(l)]
#Error in uniqueN(l) : 
#  x must be an atomic vector or data.frames/data.tables

Looks pretty easy to do, I will try to make PR for that.

The text was updated successfully, but these errors were encountered:

arunsrinivasan · 2015-07-16T17:31:35Z

@jangorecki this seems to do something different.

dt = data.table(x=c(1,1,2,3), y=c(2,2,3,4))
nrow(unique(dt)) # [1] 3

length(unique(as.list(dt))) # [1] 2

It seems to not check row-wise, rather if any element of list is identical to another.

jangorecki · 2015-07-16T18:47:00Z

I used uniqueN / length(unique( on single column while you did it on both columns. To make it more clear I simplified the example data:

library(data.table)
d1 <- data.table(l = list(list(letters[1:2]),list(Sys.time()),list(1:10),list(letters[1:2])))
d1[,length(unique(l))]
d1[,uniqueN(l)]

arunsrinivasan · 2015-07-16T19:36:03Z

Okay. I guess the idea here is that a list is a vector as well.. that's fine.
Could you make sure it'd work on multiple columns as well? For example:

d1 <- data.table(a = c(1:3,1), l = list(list(letters[1:2]),list(Sys.time()),list(1:10),list(letters[1:2])))
nrow(unique(as.data.frame(d1))) # [1] 3

We'd want that to work, if someone did:

d1[, uniqueN(.SD)]

This wouldn't work at the moment. Adding a check as to whether any column is of list type might be quite expensive when one performs a grouping operation (checking type for each group). And this is quite a rare case.. Hm.

arunsrinivasan · 2015-07-16T19:40:01Z

Maybe for single column case we can already use the fix you've provided, and take care of .SD / data.table cases later. Will merge asap. Thanks @jangorecki.

jangorecki · 2015-07-16T20:36:50Z

This should not affect .SD as the check is if (!is.atomic(x) && !is.data.frame(x)) return(length(unique(x))).

data.table(a = list(1:2,4))[,is.data.frame(.SD)]
# [1] TRUE
data.table(a = list(1:2,4))[,uniqueN(a)]
# [1] 2
data.table(a = list(1:2,4))[,uniqueN(.SD)]
#Error in forderv(x, by = by, retGrp = TRUE) : 
#  First column being ordered is type 'list', not yet supported
data.table(aa = 1:2, a = list(1:2,4))[,uniqueN(.SD)]
# [1] 2

It still can throw error in a specific case of list column as first column, but this is related to #1229.
I will update doc, readme, and squash soon.

uniqueN supports any types, solves #1224

arunsrinivasan · 2016-03-01T23:02:22Z

This fix returns wrong result for list(1:2, 3:4). length(unique()) should only be dispatched for list of lists.. not list of atomics.

jangorecki · 2016-03-01T23:30:30Z

@arunsrinivasan I'm not sure if I follow. You are saying that list should be handled same as data.table? I don't think it is a proper behavior for list, while it is for data.table.
I can create uniqueN generic so it can be flexible.

arunsrinivasan · 2016-03-01T23:33:40Z

Nope. I'm comfortable with uniqueN not supporting list-of-lists. It was not the original purpose. But it needs to do what forder() does for list of atomics (using mget() in j argument would not work as intended otherwise).

jangorecki · 2016-03-01T23:39:29Z

I will quote from source code, it is quite on point.

# simple straightforward helper function to get the number 
# of groups in a vector or data.table. Here by data.table, 
# we really mean `.SD` - used in a grouping operation

arunsrinivasan · 2016-03-01T23:44:26Z

That was done before your update to lists. It just wasn't updated. What's your point?

arunsrinivasan · 2016-03-01T23:45:30Z

Will revisit later if the issue of lists come up. I'll revert this functionality to atomics / data.frames for now.

jangorecki · 2016-03-01T23:45:49Z

@arunsrinivasan yes, just doing it now.

MichaelChirico · 2016-12-11T04:42:36Z

A use case for this is in the syntax:

DT[ , uniqueN(.(var1, var2)), by = var3]

Could be handled with uniqueN(.SD)&.SDcols = c("var1", "var2") or with uniqueN(setDT(var1, var2)).

The latter in particular pokes at: uniqueN should work at least for data.table-like lists.

I was also confused by the documentation, which has:

uniqueN(x, by=if (is.list(x)) seq_along(x) else NULL, na.rm=FALSE)

I was surprised to see a test for is.list(x) in the official usage of uniqueN, but x can't be a list list (though it's made clear in the Arguments section that this isn't handled).

jangorecki added a commit to jangorecki/data.table that referenced this issue Jul 16, 2015

uniqueN supports any types, solves Rdatatable#1224

69713af

arunsrinivasan added a commit that referenced this issue Jul 17, 2015

Merge pull request #1225 from jangorecki/uniqueN_any

bce5ddd

uniqueN supports any types, solves #1224

arunsrinivasan closed this as completed Jul 18, 2015

arunsrinivasan mentioned this issue Mar 1, 2016

uniqueN is not handling NULL value #1429

Closed

arunsrinivasan reopened this Mar 1, 2016

jangorecki self-assigned this Mar 1, 2016

arunsrinivasan modified the milestone: v1.9.8 Mar 1, 2016

jangorecki closed this as completed in 6a18d53 Mar 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uniqueN could supports list input #1224

uniqueN could supports list input #1224

jangorecki commented Jul 15, 2015

arunsrinivasan commented Jul 16, 2015

jangorecki commented Jul 16, 2015

arunsrinivasan commented Jul 16, 2015

arunsrinivasan commented Jul 16, 2015

jangorecki commented Jul 16, 2015

arunsrinivasan commented Mar 1, 2016

jangorecki commented Mar 1, 2016

arunsrinivasan commented Mar 1, 2016

jangorecki commented Mar 1, 2016

arunsrinivasan commented Mar 1, 2016

arunsrinivasan commented Mar 1, 2016

jangorecki commented Mar 1, 2016

MichaelChirico commented Dec 11, 2016 •

edited

Loading

uniqueN could supports list input #1224

uniqueN could supports list input #1224

Comments

jangorecki commented Jul 15, 2015

arunsrinivasan commented Jul 16, 2015

jangorecki commented Jul 16, 2015

arunsrinivasan commented Jul 16, 2015

arunsrinivasan commented Jul 16, 2015

jangorecki commented Jul 16, 2015

arunsrinivasan commented Mar 1, 2016

jangorecki commented Mar 1, 2016

arunsrinivasan commented Mar 1, 2016

jangorecki commented Mar 1, 2016

arunsrinivasan commented Mar 1, 2016

arunsrinivasan commented Mar 1, 2016

jangorecki commented Mar 1, 2016

MichaelChirico commented Dec 11, 2016 • edited Loading

MichaelChirico commented Dec 11, 2016 •

edited

Loading