-
Notifications
You must be signed in to change notification settings - Fork 1k
export is.sorted #4373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
export is.sorted #4373
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -155,16 +155,24 @@ setreordervec = function(x, order) .Call(Creorder, x, order) | |
| # The others (order, sort.int etc) are turned off to protect ourselves from using them internally, for speed and for | ||
| # consistency; e.g., consistent twiddling of numeric/integer64, NA at the beginning of integer, locale ordering of character vectors. | ||
|
|
||
| is.sorted = function(x, by=seq_along(x)) { | ||
| is.sorted = function(x, by=seq_along(x), retOrd=FALSE) { | ||
| if (is.list(x)) { | ||
| warning("Use 'if (length(o <- forderv(DT,by))) ...' for efficiency in one step, so you have o as well if not sorted.") | ||
| # for efficient use via retOrd argument see note in ?is.sorted | ||
| # could pass through a flag for forderv to return early on first FALSE. But we don't need that internally | ||
| # since internally we always then need ordering, an it's better in one step. Don't want inefficiency to creep in. | ||
| # This is only here for user/debugging use to check/test valid keys; e.g. data.table:::is.sorted(DT,by) | ||
| 0L == length(forderv(x,by,retGrp=FALSE,sort=TRUE)) | ||
| } else { | ||
| o = forderv(x,by,retGrp=FALSE,sort=TRUE) | ||
| ans = 0L == length(o) | ||
| if (isTRUE(retOrd)) | ||
| ans = setattr(copy(ans), "order", o) | ||
| ans | ||
| } else if (is.null(x)) { # NULL does not satisfy C isVectorAtomic | ||
| NA | ||
| } else if (is.atomic(x)) { | ||
| if (isTRUE(retOrd)) stop("retOrd works only for data.table/list input") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I saw this error message, I would make my vector a list. Should we automatically do
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On one hand it would be nice, but fsorted that we use for atomic vectors is likely to be much more efficient. So by default it definitely not good to wrap it in list.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only wrap it if |
||
| if (!missing(by)) stop("x is vector but 'by' is supplied") | ||
| .Call(Cfsorted, x) | ||
| } else { | ||
| stop("'x' argument is of unsupported type") | ||
| } | ||
| # Cfsorted could be named CfIsSorted, but since "sorted" is an adjective not verb, it's clear; e.g., Cfsort would sort it ("sort" is verb). | ||
| # Return value of TRUE/FALSE is relied on in [.data.table quite a bit on vectors. Simple. Stick with that (rather than -1/0/+1) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| \name{is.sorted} | ||
| \alias{is.sorted} | ||
| \title{ Checks if input is sorted } | ||
| \description{ | ||
| Checks if input is sorted. | ||
| } | ||
| \usage{ | ||
| is.sorted(x, by=seq_along(x), retOrd=FALSE) | ||
| } | ||
| \arguments{ | ||
| \item{x}{ data.table type object or atomic vector. } | ||
| \item{by}{ data.table columns used to check if \code{x} is sorted by those columns. } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| \item{retOrd}{ logical, when \code{TRUE} it will set an attribute \code{"order"} on the returned value, providing an order of \code{x}. Works only for data.table type \code{x}, not for atomic vector. } | ||
| } | ||
| \details{ | ||
| Checks if the input is object is sorted. Can check also by a subset of columns provided in \code{by} argument. Can also return an order used in computation when using \code{retOrd} argument. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should possibly mention integer() trick in case for 1:n
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is actually explained later on, so maybe just a pointer to section in the manual |
||
| } | ||
| \note{ | ||
| Checking sortedness is an expensive computation, and most commonly the intermediate computation, the order, could be re-used. | ||
| For example the following check | ||
|
|
||
| \preformatted{ | ||
| if (!is.sorted(DT, by="Sepal.Length")) | ||
| DT = DT[order(Sepal.Length)] | ||
| } | ||
|
|
||
| could be written as | ||
|
|
||
| \preformatted{ | ||
| if (!s <- is.sorted(DT, by="Sepal.Length", retOrd=TRUE)) | ||
| DT = DT[attr(s, "order")] | ||
jangorecki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| so the order is computated only once. Of course for performance it is even better to sort in-place using \code{\link{setkey}}. | ||
| } | ||
| \value{ | ||
| Logical scalar, TRUE or FALSE, or if \code{NULL} provided, then logical \code{NA}. When \code{retOrd} set to TRUE, the resulting logical scalar will have an attribute \code{"order"}. The attribute will be integer vector the same length as nrow of \code{x}, or length 0 integer in case if \code{x} was sorted. Any missing values are being ordered to front, unlike \code{\link[base]{order}}. Note that logical scalar having attribute attached will fail test for \code{identical}, although it will work fine with \code{isTRUE} and \code{isFALSE}. | ||
| } | ||
| \seealso{ \code{\link{data.table}} } | ||
| \examples{ | ||
| x = as.data.table(iris) | ||
| is.sorted(x, by="Species") | ||
|
|
||
| ans = is.sorted(x, by="Sepal.Length", retOrd=TRUE) | ||
| identical(ans, FALSE) | ||
| isFALSE(ans) | ||
| o = attr(ans, "order") | ||
| x[o] | ||
| } | ||
| \keyword{ data } | ||
Uh oh!
There was an error while loading. Please reload this page.