Skip to content

Commit

Permalink
Check for invalid index values before looping
Browse files Browse the repository at this point in the history
Checking for NA, NaN, and +/-Inf values in the first position of the
if-else statement was a bad idea for performance reasons, but was also
incorrect.

Any time two merged objects had different start or end index values,
xp or yp ar outside the bounds of their respective index vectors. That
didn't cause a segfault though, because R_FINITE() would happily
interpret the random memory as either a number or NaN. This would
manifest as an error when there should be none, or an infinite loop.

We can fix the logic and improve performance by taking advantage of the
fact that the indexes are sorted. Therefore, if present, -Inf will
always be in the first element, while NA, Inf, and NaN will always be
in the last element.

In the case of an integer index, the NA will always be last (because
that's how it sorts in R, even though it's equal to INT_MIN in C).

See #174.
  • Loading branch information
joshuaulrich committed Jun 7, 2017
1 parent 6401ccf commit 1d0e4c1
Showing 1 changed file with 21 additions and 9 deletions.
30 changes: 21 additions & 9 deletions src/merge.c
Original file line number Diff line number Diff line change
Expand Up @@ -155,10 +155,15 @@ SEXP do_merge_xts (SEXP x, SEXP y,
if( TYPEOF(xindex) == REALSXP ) {
real_xindex = REAL(xindex);
real_yindex = REAL(yindex);

/* Check for illegal values before looping. Due to ordered index,
* -Inf must be first, while NA, Inf, and NaN must be last. */
if (!R_FINITE(real_xindex[0]) || !R_FINITE(real_xindex[nrx-1])
|| !R_FINITE(real_yindex[0]) || !R_FINITE(real_yindex[nry-1])) {
error("'index' cannot contain 'NA', 'NaN', or '+/-Inf'");
}

while( (xp + yp) <= (len + 1) ) {
if(!R_FINITE(real_xindex[ xp-1 ]) || !R_FINITE(real_yindex[ yp-1 ])) {
error("'index' cannot contain 'NA', 'NaN', or 'Inf'");
} else
if( xp > nrx ) {
yp++;
if(right_join) i++;
Expand All @@ -182,17 +187,23 @@ SEXP do_merge_xts (SEXP x, SEXP y,
/* RIGHT JOIN */
yp++;
if(right_join) i++;
}
} else
error("Invalid index element comparison (should never happen)");
}
} else
if( TYPEOF(xindex) == INTSXP ) {
int_xindex = INTEGER(xindex);
int_yindex = INTEGER(yindex);

/* Check for NA before looping; logical ops on NA may yield surprising
* results. Note that the NA_integer_ will appear in the last value of
* the index because of sorting at the R level, even though NA_INTEGER
* equals INT_MIN at the C level. */
if (int_xindex[nrx-1] == NA_INTEGER || int_yindex[nry-1] == NA_INTEGER) {
error("'index' cannot contain 'NA'");
}

while( (xp + yp) <= (len + 1) ) {
/* Check for NA first; logical ops on them may yield surprising results */
if(int_xindex[ xp-1 ]==NA_INTEGER || int_yindex[ yp-1 ]==NA_INTEGER) {
error("'index' cannot contain 'NA'");
} else
if( xp > nrx ) {
yp++;
if(right_join) i++;
Expand All @@ -213,7 +224,8 @@ SEXP do_merge_xts (SEXP x, SEXP y,
if( int_xindex[ xp-1 ] > int_yindex[ yp-1 ] ) {
yp++;
if(right_join) i++;
}
} else
error("Invalid index element comparison (should never happen)");
}
}

Expand Down

0 comments on commit 1d0e4c1

Please sign in to comment.