Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set() adds new cols when rows aren't updated #6204

Merged
merged 12 commits into from
Jul 29, 2024
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,8 @@ d1[d2, on="id", verbose=TRUE]

This feature resolves [#4387](https://github.com/Rdatatable/data.table/issues/4387), [#2947](https://github.com/Rdatatable/data.table/issues/2947), [#4380](https://github.com/Rdatatable/data.table/issues/4380), and [#1321](https://github.com/Rdatatable/data.table/issues/1321). Thanks to @jangorecki, @jan-glx, and @MichaelChirico for the reports and @jangorecki for implementing.

23. `set()` now adds new columns even if no rows are updated, [#5409](https://github.com/Rdatatable/data.table/issues/5409). This behavior is now consistent with `:=`, thanks to @mb706 for the report and @joshhwuu for the fix.

## TRANSLATIONS

1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.
Expand Down
9 changes: 9 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -19002,3 +19002,12 @@ test(2277.2, DT[, closure(b), env=list(closure=var)], 0.5)
test(2277.3, DT[, closure(b), env=list(closure=stats::var)], 0.5)
test(2277.4, DT[, closure(b), env=list(closure=stats:::var)], 0.5)
test(2277.5, DT[, lambda(b), env=list(lambda=function(x) sum(x))], 7L)

# test that set() correctly adds new columns even if no rows are updated
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
dt = data.table(a=1L)
test(2278.1, set(copy(dt), 0L, "b", logical(0)), data.table(a=1L, b=NA))
test(2278.2, set(copy(dt), NA_integer_, "b", NA), data.table(a=1L, b=NA))
test(2278.3, set(copy(dt), 0L, "b", NA), copy(dt)[0L, b := NA])
test(2278.4, set(copy(dt), NA_integer_, "b", logical(0)), copy(dt)[NA_integer_, b := logical(0)])
test(2278.5, set(copy(dt), integer(0), "b", numeric(0)), copy(dt)[integer(0), b := numeric(0)])
test(2278.6, { set(dt, 0L, "b", logical(0)); set(dt, 1L, "a", 2L); dt }, data.table(a=2L, b=NA))
2 changes: 1 addition & 1 deletion man/assign.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ set(x, i = NULL, j, value)
\item{LHS}{ A character vector of column names (or numeric positions) or a variable that evaluates as such. If the column doesn't exist, it is added, \emph{by reference}. }
\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. }
\item{x}{ A \code{data.table}. Or, \code{set()} accepts \code{data.frame}, too. }
\item{i}{ Optional. Indicates the rows on which the values must be updated with. If not provided, implies \emph{all rows}. The \code{:=} form is more powerful as it allows \emph{subsets} and \code{joins} based add/update columns by reference. See \code{Details}.
\item{i}{ Optional. Indicates the rows on which the values must be updated. If not \code{NULL}, implies \emph{all rows}. Missing or zero values are ignored. The \code{:=} form is more powerful as it allows adding/updating columns by reference based on \emph{subsets} and \code{joins}. See \code{Details}.

In \code{set}, only integer type is allowed in \code{i} indicating which rows \code{value} should be assigned to. \code{NULL} represents all rows more efficiently than creating a vector such as \code{1:nrow(x)}. }
\item{j}{ Column name(s) (character) or number(s) (integer) to be assigned \code{value} when column(s) already exist, and only column name(s) if they are to be created. }
Expand Down
5 changes: 3 additions & 2 deletions src/assign.c
Original file line number Diff line number Diff line change
Expand Up @@ -378,12 +378,13 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
for (int i=0; i<targetlen; ++i) {
if ((rowsd[i]<0 && rowsd[i]!=NA_INTEGER) || rowsd[i]>nrow)
error(_("i[%d] is %d which is out of range [1,nrow=%d]"), i+1, rowsd[i], nrow); // set() reaches here (test 2005.2); := reaches the same error in subset.c first
if (rowsd[i]>=1) numToDo++;
if (rowsd[i]>=0) numToDo++;
}
if (verbose) Rprintf(_("Assigning to %d row subset of %d rows\n"), numToDo, nrow);
// TODO: include in message if any rows are assigned several times (e.g. by=.EACHI with dups in i)
if (numToDo==0) {
if (!length(newcolnames)) {
// isString(cols) is exclusive to calls from set()
if (!length(newcolnames) && !isString(cols)) {
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
*_Last_updated = 0;
UNPROTECT(protecti);
return(dt); // all items of rows either 0 or NA. !length(newcolnames) for #759
Expand Down
Loading