Merge pull request #1225 from jangorecki/uniqueN_any

uniqueN supports any types, solves #1224
Rdatatable · Jul 17, 2015 · bce5ddd · bce5ddd
2 parents 0142c2f + 69713af
commit bce5ddd
Show file tree

Hide file tree

Showing 4 changed files with 9 additions and 4 deletions.
diff --git a/R/duplicated.R b/R/duplicated.R
@@ -92,7 +92,7 @@ anyDuplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=
 # we really mean `.SD` - used in a grouping operation
 uniqueN <- function(x, by = if (is.data.table(x)) key(x) else NULL) {
     if (!is.atomic(x) && !is.data.frame(x))
-        stop("x must be an atomic vector or data.frames/data.tables")
+        return(length(unique(x)))
     if (is.atomic(x)) x = as_list(x)
     if (is.null(by)) by = seq_along(x)
     length(attr(forderv(x, by=by, retGrp=TRUE), 'starts'))

diff --git a/README.md b/README.md
@@ -33,8 +33,9 @@
 
   9. `rbindlist` gains `idcol` argument which can be used to generate an index column. If `idcol=TRUE`, the column is automatically named `.id`. Instead you can also provide a column name directly. If the input list has no names, indices are automatically generated. Closes [#591](https://github.com/Rdatatable/data.table/issues/591). Also thanks to @KevinUshey for filing [#356](https://github.com/Rdatatable/data.table/issues/356).
 
-  10. A new helper function `uniqueN` is now implemented. It is equivalent to `length(unique(x))` but much faster. It accepts `atomic vectors`, `data.frames` and `data.tables` as input and returns the number of unique rows. For example, `DT[, .(uN = uniqueN(.SD)), by=x]` returns the number of unique rows within each group of `x`. Thanks to @DavidArenburg as well for the FR.
+  10. A new helper function `uniqueN` is now implemented. It is equivalent to `length(unique(x))` but much faster. It handless `atomic vectors`, `data.frames` and `data.tables` as input and returns the number of unique rows, any other types are accepted but will be passed to base functions. For example, `DT[, .(uN = uniqueN(.SD)), by=x]` returns the number of unique rows within each group of `x`. Thanks to @DavidArenburg as well for the FR.
     * `uniqueN` gains a `by` argument which is equal to `key(x)` when `x` is a `data.table` so that the behaviour is identical to `duplicated()` and and `unique` methods for `data.table`. Thanks to @kevinmistry for the report. Closes [#1080](https://github.com/Rdatatable/data.table/issues/1080).
+    * `uniqueN` now accepts any types, but types other than `atomic vectors`, `data.frames` and `data.tables` are dispatched to base `length(unique(x))`. Thanks to @jangorecki. Closes [#1224](https://github.com/Rdatatable/data.table/issues/1224).
 
   11. Implemented `transpose()` to transpose a list and `tstrsplit` which is a wrapper for `transpose(strsplit(...))`. This is particularly useful in scenarios where a column has to be split and the resulting list has to be assigned to multiple columns. See `?transpose` and `?tstrsplit`, [#1025](https://github.com/Rdatatable/data.table/issues/1025) and [#1026](https://github.com/Rdatatable/data.table/issues/1026) for usage scenarios. Closes both #1025 and #1026 issues.
     * Implemented `type.convert` as suggested by Richard Scriven. Closes [#1094](https://github.com/Rdatatable/data.table/issues/1094).

diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw
@@ -6524,6 +6524,10 @@ test(1537 , names(melt(dt, id=1L, variable.name = "x", value.name="x")), c("x",
 # test for tables()
 test(1538, tables(), output = "Total:")
 
+# uniqueN could supports list input #1224
+d1 <- data.table(a = 1:4, l = list(list(letters[1:2]),list(Sys.time()),list(1:10),list(letters[1:2])))
+test(1539, d1[,uniqueN(l)], 3L)
+
 ##########################
 
 

diff --git a/man/duplicated.Rd b/man/duplicated.Rd
@@ -16,7 +16,7 @@
 
      \code{anyDuplicated} returns the \emph{index} \code{i} of the first duplicated entry if there is one, and 0 otherwise. 
 
-     \code{uniqueN} is equivalent to \code{length(unique(x))} but much faster. It accepts \code{atomic vectors}, \code{data.frames} and \code{data.tables}. The number of unique rows are computed directly without materialising the intermediate unique data.table and is therefore memory efficient as well.
+     \code{uniqueN} is equivalent to \code{length(unique(x))} but much faster for \code{atomic vectors}, \code{data.frames} and \code{data.tables}, for other types it dispatch to \code{length(unique(x))}. The number of unique rows are computed directly without materialising the intermediate unique data.table and is therefore memory efficient as well.
 
 }
 \usage{
@@ -29,7 +29,7 @@
 uniqueN(x, by=if (is.data.table(x)) key(x) else NULL)
 }
 \arguments{
-  \item{x}{ A data.table. \code{uniqueN} accepts atomic vectors and data.frames as well.}
+  \item{x}{ A data.table, atomic vectors or data.frames. Other types are supported but they will be dispatched to \code{length(unique(x))}.}
   \item{\dots}{ Not used at this time. }
   \item{incomparables}{ Not used. Here for S3 method consistency. }
   \item{fromLast}{ logical indicating if duplication should be considered from the reverse side, i.e., the last (or rightmost) of identical elements would correspond to \code{duplicated = FALSE}.}