Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

between verbose, POSIXct support, RHS dot-alias, int64 support #3518

Merged
merged 31 commits into from
May 13, 2019
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
8047463
between incbounds, rm nocov, and test
jangorecki Apr 22, 2019
a153f9d
add verbose to between function, closes #3516
jangorecki Apr 22, 2019
48f3163
between support for POSIXct, closes #3519
jangorecki Apr 22, 2019
2fa1390
between support dot-list alias in RHS, closes #2315
jangorecki Apr 22, 2019
ccf0b42
fix missing check for is.call
jangorecki Apr 22, 2019
5f48c7e
Merge branch 'between-verbose' of https://github.com/Rdatatable/data.…
Apr 23, 2019
47b52e3
new functionality for parallel processing to apply to POSIX-as-string…
Apr 23, 2019
7841fef
Merge branch 'master' into between-verbose
mattdowle Apr 23, 2019
8fd7571
remove LHS=character restriction
Apr 24, 2019
1920c77
Merge branch 'between-verbose' of https://github.com/Rdatatable/data.…
Apr 24, 2019
64dec55
Merge branch 'master' into between-verbose
mattdowle Apr 25, 2019
088348f
merge master
mattdowle Apr 26, 2019
ed0cc29
GetVerbose() added at C level and used instead of needing to pass thr…
mattdowle Apr 26, 2019
f4a05c7
between posix from char more robust, remove unclass
jangorecki May 1, 2019
e25b0f4
more strict condition to redirect to Cbetween
jangorecki May 1, 2019
e18b095
Merge branch 'master' into between-verbose
jangorecki May 1, 2019
1259b66
between unit tests, minor reformat
jangorecki May 1, 2019
aa175aa
between int64 support and num-to-int coerce, #3517
jangorecki May 1, 2019
a60972b
update news file for more changes in between function
jangorecki May 1, 2019
102c6db
revert mistakenly altered branch of int32
jangorecki May 1, 2019
ab6754c
fixes and tests towards #3517, speed up isReallyReal
jangorecki May 1, 2019
6490517
remove debug print
jangorecki May 1, 2019
6ed0d69
typo fix
jangorecki May 1, 2019
12f1d2b
between int64 unit tests
jangorecki May 1, 2019
1548cce
minor fix to between int64 and more unit tests
jangorecki May 2, 2019
219f650
Merge branch 'master' into between-verbose
jangorecki May 2, 2019
aeee086
Merge branch 'master' into between-verbose
mattdowle May 2, 2019
851745e
between changes codecov
jangorecki May 3, 2019
3d0e4fe
Merge branch 'master' into between-verbose
mattdowle May 13, 2019
a0bd9b5
LLONG_MIN => INT64_MIN to isolate from possible LLONG variance, and m…
mattdowle May 13, 2019
89fa2cf
isReallyRealC => isRealReallyInt to simplify ||
mattdowle May 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@

7. New variable `.Last.updated` (similar to R's `.Last.value`) contains the number of rows affected by the most recent `:=` or `set()`, [#1885](https://github.com/Rdatatable/data.table/issues/1885).

8. `between()` and `%between%` are faster for `POSIXct`, [#3519](https://github.com/Rdatatable/data.table/issues/3519), and now support the `.()` alias, [#2315](https://github.com/Rdatatable/data.table/issues/2315). Thanks to @Henrik-P for the reports.

#### BUG FIXES

1. `first`, `last`, `head` and `tail` by group no longer error in some cases, [#2030](https://github.com/Rdatatable/data.table/issues/2030) [#3462](https://github.com/Rdatatable/data.table/issues/3462). Thanks to @franknarf1 for reporting.
Expand Down
45 changes: 40 additions & 5 deletions R/between.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,58 @@ between <- function(x,lower,upper,incbounds=TRUE) {
if (is.logical(x)) stop("between has been x of type logical")
if (is.logical(lower)) lower = as.integer(lower) # typically NA (which is logical type)
if (is.logical(upper)) upper = as.integer(upper) # typically NA (which is logical type)
is_strictly_numeric <- function(x) is.numeric(x) && !inherits(x, "integer64")
if (is_strictly_numeric(x) && is_strictly_numeric(lower) && is_strictly_numeric(upper)) {
is.px = function(x) inherits(x, "POSIXct")
# POSIX special handling to auto coerce character
if (is.px(x) && !is.null(tz<-attr(x, "tzone", TRUE)) && nzchar(tz) &&
(is.character(lower) || is.character(upper))) {
try_posix_cast <- function(x, tz) {tryCatch(
list(status=0L, value=as.POSIXct(x, tz = tz)),
error = function(e) list(status=1L, value=NULL, message=e[["message"]])
)}
if (is.character(lower)) {
ans = try_posix_cast(lower, tz)
if (ans$status==0L) lower = ans$value
else stop("'between' function the 'x' argument is a POSIX class while 'lower' was not, coercion to POSIX failed with: ", ans$message)
}
if (is.character(upper)) {
ans = try_posix_cast(upper, tz)
if (ans$status==0L) upper = ans$value
else stop("'between' function the 'x' argument is a POSIX class while 'upper' was not, coercion to POSIX failed with: ", ans$message)
}
stopifnot(is.px(x), is.px(lower), is.px(upper)) # nocov # internal
}
# POSIX check time zone match
if (is.px(x) && is.px(lower) && is.px(upper)) {
tz_match = function(x, y, z) { # NULL match "", else all identical
((is.null(x) || !nzchar(x)) && (is.null(y) || !nzchar(y)) && (is.null(z) || !nzchar(z))) ||
(identical(x, y) && identical(x, z))
}
if (!tz_match(attr(x, "tzone", TRUE), attr(lower, "tzone", TRUE), attr(upper, "tzone", TRUE))) {
stop("'between' function arguments have mismatched timezone attribute, align all arguments to same timezone")
}
}
is.supported = function(x) (is.numeric(x) && !inherits(x, "integer64")) || is.px(x)
if (is.supported(x) && is.supported(lower) && is.supported(upper)) {
# faster parallelised version for int/double.
# Cbetween supports length(lower)==1 (recycled) and (from v1.12.0) length(lower)==length(x).
# length(upper) can be 1 or length(x) independently of lower
.Call(Cbetween, x, lower, upper, incbounds)
} else {
if (isTRUE(getOption("datatable.verbose"))) cat("optimised between not available for this data type, fallback to slow R routine\n")
# now just for character input. TODO: support character between in Cbetween and remove this branch
if(incbounds) x>=lower & x<=upper
if (incbounds) x>=lower & x<=upper
else x>lower & x<upper
}
}

# %between% is vectorised, #534.
"%between%" <- function(x, y) {
ysub = substitute(y)
if (is.call(ysub) && ysub[[1L]]==".") {
ysub[[1L]]=quote(list)
y = eval.parent(ysub)
}
if ((l <- length(y)) != 2L) {
ysub = substitute(y)
stop("RHS has length() ", l, "; expecting length 2. ",
if (is.call(ysub) && ysub[[1L]] == 'c')
sprintf("Perhaps you meant %s? ",
Expand All @@ -37,7 +72,7 @@ inrange <- function(x,lower,upper,incbounds=TRUE) {
query = setDT(list(x=x))
subject = setDT(list(l=lower, u=upper))
ops = if (incbounds) c(4L, 2L) else c(5L, 3L) # >=,<= and >,<
verbose = getOption("datatable.verbose")
verbose = isTRUE(getOption("datatable.verbose"))
if (verbose) {last.started.at=proc.time();cat("forderv(query) took ... ");flush.console()}
xo = forderv(query)
if (verbose) {cat(timetaken(last.started.at),"\n"); flush.console()}
Expand Down
49 changes: 49 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -14133,6 +14133,55 @@ test(2030.18, .Last.updated, 0L) # zero match
test(2031.01, rbind(data.table(A=1:3, B=7:9), data.table(A=4:6, B=as.list(10:12))), ans<-data.table(A=1:6, B=as.list(7:12)))
test(2031.02, rbind(data.table(A=1:3, B=as.list(7:9)), data.table(A=4:6, B=10:12)), ans)

# `between` invalid args, and verbose #3516
test(2032.01, between(1:5, 2, 4, incbounds=423), error="incbounds must be logical")
test(2032.02, between(1:5, 2, 4, incbounds=NA), error="incbounds must be logical")
old = options(datatable.verbose=TRUE)
test(2032.05, between(1:5, 2L, 4L), output="between parallel processing of integer with recycling took")
test(2032.06, between(1:5, rep(2L,5L), rep(4L, 5L)), output="between parallel processing of integer took")
test(2032.07, between(as.double(1:5), 2, 4, incbounds=FALSE), output="between parallel processing of double using open bounds with recycling took")
test(2032.08, between(as.double(1:5), 2, 4), output="between parallel processing of double using closed bounds with recycling took")
test(2032.09, between(as.double(1:5), rep(2, 5L), rep(4, 5L)), output="between parallel processing of double took")
test(2032.10, between(c("foo","bar","paz"), "bag", "fog"), output="optimised between not available for this data type, fallback to slow R routine")

# `between` handle POSIXct type
x = as.POSIXct("2016-09-18 07:00:00") + 0:10*60*15
dn = as.POSIXct('2016-09-18 08:00:00')
up = as.POSIXct('2016-09-18 09:00:00')
test(2032.11, between(x, dn, up), output="between parallel processing of double using closed bounds with recycling took")
test(2032.12, between(x, dn, up, incbounds=FALSE), output="between parallel processing of double using open bounds with recycling took")

# also handling of string lower/upper bounds _only when x has a time zone_
x = as.POSIXct("2016-09-18 07:00:00") + 0:10*60*15
dn = '2016-09-18 08:00:00'
up = '2016-09-18 09:00:00'
test(2032.13, between(x, dn, up), output = 'optimised between not available')
attr(x, 'tzone') = 'UTC'
test(2032.14, between(x, dn, up), output = 'between parallel processing of double')

# additional flexibility -- cast when one bound is already POSIXct
up = as.POSIXct(up, tz="UTC")
test(2032.15, between(x, dn, up), output = 'between parallel processing of double')
options(old)

# exceptions in char to POSIX coercion
dn = 'aa2016-09-18 08:00:00'
test(2032.16, between(x, dn, up), error="coercion to POSIX failed")
dn = '2016-09-18 08:00:00'
up = 'bb2016-09-18 09:00:00'
test(2032.17, between(x, dn, up), error="coercion to POSIX failed")

# exceptions due to timezone mismatch
x = as.POSIXct("2016-09-18 07:00:00", tz="UTC") + 0:10*60*15
dn = as.POSIXct('2016-09-18 08:00:00')
up = '2016-09-18 09:00:00'
test(2032.18, between(x, dn, up), error="mismatched timezone attribute")

# `between` support `.` in RHS #2315
X = data.table(a = 1:5, b = 6:10, c = c(5:1))
test(2032.19, X[c %between% list(a, b)], X[c %between% .(a, b)])



###################################
# Add new tests above this line #
Expand Down
14 changes: 13 additions & 1 deletion src/between.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {

R_len_t nx = length(x), nl = length(lower), nu = length(upper);
double tic=0.0;
if (!nx || !nl || !nu)
return (allocVector(LGLSXP, 0));
const int longest = MAX(MAX(nx, nl), nu);
Expand All @@ -12,7 +13,8 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {
error("Incompatible vector lengths: length(x)==%d length(lower)==%d length(upper)==%d. Each should be either length 1 or the length of the longest.", nx, nl, nu);
}
if (!isLogical(bounds) || LOGICAL(bounds)[0] == NA_LOGICAL)
error("incbounds must be logical TRUE/FALSE."); // # nocov
error("incbounds must be logical TRUE/FALSE.");
const bool verbose = GetVerbose();

int nprotect = 0;
bool integer=true;
Expand Down Expand Up @@ -41,16 +43,19 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {
if (!recycleX && recycleLow && recycleUpp) {
const int l = lp[0] + open; // +open so we can always use >= and <=. NA_INTEGER+1 == -INT_MAX == INT_MIN+1 (so NA limit handled by this too)
const int u = up[0]==NA_INTEGER ? INT_MAX : up[0] - open;
if (verbose) tic = omp_get_wtime();
#pragma omp parallel for num_threads(getDTthreads())
for (int i=0; i<longest; i++) {
int elem = xp[i];
ansp[i] = elem==NA_INTEGER ? NA_LOGICAL : (l<=elem && elem<=u);
}
if (verbose) Rprintf("between parallel processing of integer with recycling took %8.3fs\n", omp_get_wtime()-tic);
}
else {
const int xMask = recycleX ? 0 : INT_MAX;
const int lowMask = recycleLow ? 0 : INT_MAX;
const int uppMask = recycleUpp ? 0 : INT_MAX;
if (verbose) tic = omp_get_wtime();
#pragma omp parallel for num_threads(getDTthreads())
for (int i=0; i<longest; i++) {
int elem = xp[i & xMask];
Expand All @@ -59,6 +64,7 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {
u = (u==NA_INTEGER) ? INT_MAX : u-open;
ansp[i] = elem==NA_INTEGER ? NA_LOGICAL : (l<=elem && elem<=u);
}
if (verbose) Rprintf("between parallel processing of integer took %8.3fs\n", omp_get_wtime()-tic);
}
} else {
// type real
Expand All @@ -69,23 +75,28 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {
const double l = isnan(lp[0]) ? -INFINITY : lp[0];
const double u = isnan(up[0]) ? INFINITY : up[0];
if (open) {
if (verbose) tic = omp_get_wtime();
#pragma omp parallel for num_threads(getDTthreads())
for (int i=0; i<longest; i++) {
double elem = xp[i];
ansp[i] = isnan(elem) ? NA_LOGICAL : (l<elem && elem<u);
}
if (verbose) Rprintf("between parallel processing of double using open bounds with recycling took %8.3fs\n", omp_get_wtime()-tic);
} else {
if (verbose) tic = omp_get_wtime();
#pragma omp parallel for num_threads(getDTthreads())
for (int i=0; i<longest; i++) {
double elem = xp[i];
ansp[i] = isnan(elem) ? NA_LOGICAL : (l<=elem && elem<=u);
}
if (verbose) Rprintf("between parallel processing of double using closed bounds with recycling took %8.3fs\n", omp_get_wtime()-tic);
}
}
else {
const int xMask = recycleX ? 0 : INT_MAX;
const int lowMask = recycleLow ? 0 : INT_MAX;
const int uppMask = recycleUpp ? 0 : INT_MAX;
if (verbose) tic = omp_get_wtime();
#pragma omp parallel for num_threads(getDTthreads())
for (int i=0; i<longest; i++) {
double elem = xp[i & xMask];
Expand All @@ -95,6 +106,7 @@ SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {
if (isnan(u)) u= INFINITY;
ansp[i] = isnan(elem) ? NA_LOGICAL : (open ? l<elem && elem<u : l<=elem && elem<=u);
}
if (verbose) Rprintf("between parallel processing of double took %8.3fs\n", omp_get_wtime()-tic);
}
}
UNPROTECT(nprotect);
Expand Down
5 changes: 5 additions & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,11 @@ SEXP sym_index;
SEXP sym_BY;
SEXP sym_starts, char_starts;
SEXP sym_maxgrpn;
SEXP sym_verbose;
bool INHERITS(SEXP x, SEXP char_);
long long DtoLL(double x);
double LLtoD(long long x);
bool GetVerbose();
double NA_INT64_D;
long long NA_INT64_LL;

Expand Down Expand Up @@ -175,3 +177,6 @@ SEXP colnamesInt(SEXP x, SEXP cols);
void nafillDouble(double *x, uint_fast64_t nx, unsigned int type, double fill, ans_t *ans, bool verbose);
void nafillInteger(int32_t *x, uint_fast64_t nx, unsigned int type, int32_t fill, ans_t *ans, bool verbose);
SEXP nafillR(SEXP obj, SEXP type, SEXP fill, SEXP inplace, SEXP cols, SEXP verbose);

// between.c
SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds);
7 changes: 7 additions & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ void attribute_visible R_init_datatable(DllInfo *info)
sym_index = install("index");
sym_BY = install(".BY");
sym_maxgrpn = install("maxgrpn");
sym_verbose = install("datatable.verbose");
SelfRefSymbol = install(".internal.selfref");

initDTthreads();
Expand Down Expand Up @@ -335,6 +336,12 @@ inline double LLtoD(long long x) {
return u.d;
}

bool GetVerbose() {
// don't call repetitively; save first in that case
SEXP opt = GetOption(sym_verbose, R_NilValue);
return isLogical(opt) && LENGTH(opt)==1 && LOGICAL(opt)[0]==1;
}

// # nocov start
SEXP hasOpenMP() {
// Just for use by onAttach (hence nocov) to avoid an RPRINTF from C level which isn't suppressable by CRAN
Expand Down