Skip to content

Commit

Permalink
Merge pull request #1225 from jangorecki/uniqueN_any
Browse files Browse the repository at this point in the history
uniqueN supports any types, solves #1224
  • Loading branch information
arunsrinivasan committed Jul 17, 2015
2 parents 0142c2f + 69713af commit bce5ddd
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 4 deletions.
2 changes: 1 addition & 1 deletion R/duplicated.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ anyDuplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=
# we really mean `.SD` - used in a grouping operation
uniqueN <- function(x, by = if (is.data.table(x)) key(x) else NULL) {
if (!is.atomic(x) && !is.data.frame(x))
stop("x must be an atomic vector or data.frames/data.tables")
return(length(unique(x)))
if (is.atomic(x)) x = as_list(x)
if (is.null(by)) by = seq_along(x)
length(attr(forderv(x, by=by, retGrp=TRUE), 'starts'))
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@

9. `rbindlist` gains `idcol` argument which can be used to generate an index column. If `idcol=TRUE`, the column is automatically named `.id`. Instead you can also provide a column name directly. If the input list has no names, indices are automatically generated. Closes [#591](https://github.com/Rdatatable/data.table/issues/591). Also thanks to @KevinUshey for filing [#356](https://github.com/Rdatatable/data.table/issues/356).

10. A new helper function `uniqueN` is now implemented. It is equivalent to `length(unique(x))` but much faster. It accepts `atomic vectors`, `data.frames` and `data.tables` as input and returns the number of unique rows. For example, `DT[, .(uN = uniqueN(.SD)), by=x]` returns the number of unique rows within each group of `x`. Thanks to @DavidArenburg as well for the FR.
10. A new helper function `uniqueN` is now implemented. It is equivalent to `length(unique(x))` but much faster. It handless `atomic vectors`, `data.frames` and `data.tables` as input and returns the number of unique rows, any other types are accepted but will be passed to base functions. For example, `DT[, .(uN = uniqueN(.SD)), by=x]` returns the number of unique rows within each group of `x`. Thanks to @DavidArenburg as well for the FR.
* `uniqueN` gains a `by` argument which is equal to `key(x)` when `x` is a `data.table` so that the behaviour is identical to `duplicated()` and and `unique` methods for `data.table`. Thanks to @kevinmistry for the report. Closes [#1080](https://github.com/Rdatatable/data.table/issues/1080).
* `uniqueN` now accepts any types, but types other than `atomic vectors`, `data.frames` and `data.tables` are dispatched to base `length(unique(x))`. Thanks to @jangorecki. Closes [#1224](https://github.com/Rdatatable/data.table/issues/1224).

11. Implemented `transpose()` to transpose a list and `tstrsplit` which is a wrapper for `transpose(strsplit(...))`. This is particularly useful in scenarios where a column has to be split and the resulting list has to be assigned to multiple columns. See `?transpose` and `?tstrsplit`, [#1025](https://github.com/Rdatatable/data.table/issues/1025) and [#1026](https://github.com/Rdatatable/data.table/issues/1026) for usage scenarios. Closes both #1025 and #1026 issues.
* Implemented `type.convert` as suggested by Richard Scriven. Closes [#1094](https://github.com/Rdatatable/data.table/issues/1094).
Expand Down
4 changes: 4 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -6524,6 +6524,10 @@ test(1537 , names(melt(dt, id=1L, variable.name = "x", value.name="x")), c("x",
# test for tables()
test(1538, tables(), output = "Total:")

# uniqueN could supports list input #1224
d1 <- data.table(a = 1:4, l = list(list(letters[1:2]),list(Sys.time()),list(1:10),list(letters[1:2])))
test(1539, d1[,uniqueN(l)], 3L)

##########################


Expand Down
4 changes: 2 additions & 2 deletions man/duplicated.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

\code{anyDuplicated} returns the \emph{index} \code{i} of the first duplicated entry if there is one, and 0 otherwise.

\code{uniqueN} is equivalent to \code{length(unique(x))} but much faster. It accepts \code{atomic vectors}, \code{data.frames} and \code{data.tables}. The number of unique rows are computed directly without materialising the intermediate unique data.table and is therefore memory efficient as well.
\code{uniqueN} is equivalent to \code{length(unique(x))} but much faster for \code{atomic vectors}, \code{data.frames} and \code{data.tables}, for other types it dispatch to \code{length(unique(x))}. The number of unique rows are computed directly without materialising the intermediate unique data.table and is therefore memory efficient as well.

}
\usage{
Expand All @@ -29,7 +29,7 @@
uniqueN(x, by=if (is.data.table(x)) key(x) else NULL)
}
\arguments{
\item{x}{ A data.table. \code{uniqueN} accepts atomic vectors and data.frames as well.}
\item{x}{ A data.table, atomic vectors or data.frames. Other types are supported but they will be dispatched to \code{length(unique(x))}.}
\item{\dots}{ Not used at this time. }
\item{incomparables}{ Not used. Here for S3 method consistency. }
\item{fromLast}{ logical indicating if duplication should be considered from the reverse side, i.e., the last (or rightmost) of identical elements would correspond to \code{duplicated = FALSE}.}
Expand Down

0 comments on commit bce5ddd

Please sign in to comment.