Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revdep chicane affected #5125

Closed
mattdowle opened this issue Sep 1, 2021 · 1 comment · Fixed by #5849 · May be fixed by #5133
Closed

revdep chicane affected #5125

mattdowle opened this issue Sep 1, 2021 · 1 comment · Fixed by #5849 · May be fixed by #5133
Milestone

Comments

@mattdowle
Copy link
Member

Emailed chicane's maintainer today, below.
3 other revdeps out of 1,111 break that I haven't looked at yet.
If there is a way to make names()<- continue to retain the key, then that would be better. On first glance I can't think of a way, but I'll have another go. Probably more <- methods are affected too.
In the meantime I figured it was best to get the change request to the revdep since setnames() is better regardless.


Greetings,

revdep testing of the next data.table release (v1.14.2) shows that chicane will be affected.

I normally submit a PR to your project but in your case I can't find chicane on either R-Forge or GitHub. The DESCRIPTION file has a link to R-Forge, but not the project page on R-Forge, and searching for projects by name on R-Forge doesn't seem to work for any package, even when I'm logged in to R-Forge.

Fortunately in this case, the changes needed can be expressed over email. In add.covariates.R please change lines 61 and 73 as follows :

# from :
names(interaction.data)[ 'trans.count' == names(interaction.data) ] <- 'bait.trans.count';
# to:
setnames(interaction.data, "trans.count", "bait.trans.count")

# from :
names(interaction.data)[ 'trans.count' == names(interaction.data) ] <- 'target.trans.count';
# to :
setnames(interaction.data, "trans.count", "target.trans.count")

The reason is that the <- way goes via R's <- mechanism which copies the data and there's nothing I can do to avoid that copy when using R's <- mechanism. That's why setnames() exists. Further, setnames is more robust because you don't need to repeat the object name (interaction.data) a 2nd time which can cause bugs when the 2nd use happens to be another object with a similar name that gets used accidentally. setnames() will also error if the column name being changed does not exist. The error suggests passing skip_absent=TRUE if it is intentional that sometimes the column might not exist. These features make setnames() a much better way to change column names of a data.table, regardless of <- being affected by the next release.

The reason the names(DT)<- breaks chicane in the next release of data.table is that the key is now dropped due to detecting R's copy. The lack of key in the result causes 4 expect_equal tests to fail in test_add.covariates.R, lines 65, 93, 122 and 151. I confirmed that R CMD check passes (with 2 unrelated notes) by changing to use setnames() in add.covarates.R.

You don't need to wait for v1.14.2 to be released, changing to setnames() will work before and after.

It is a consequence of bug fix 28 :

dplyr::arrange(DT) uses vctrs::vec_slice which retains data.table's class but uses C to bypass [ method dispatch and does not adjust data.table's attributes containing the index row numbers, #5042. data.table's long-standing .internal.selfref mechanism to detect such operations by other packages was not being checked by data.table when using indexes, causing data.table filters and joins to use invalid indexes and return incorrect results after a dplyr::arrange(DT). Thanks to Waldi73 for reporting; avimallu, tlapak, MichaelChirico, jangorecki and hadley for investigating and suggestions; and mattdowle for the PR. The intended way to use data.table is data.table::setkey(DT, col1, col2, ...) which reorders DT by reference in parallel, sets the primary key for automatic use by subsequent data.table queries, and permits rowname-like usage such as DT["foo",] which returns the now-contiguous-in-memory block of rows where the first column of DT's key contains "foo". Multi-column-rownames (i.e. a primary key of more than one column) can be looked up using DT[.("foo",20210728L), ]. Using == in i is also optimized to use the key or indices, if you prefer using column names explicitly and ==. An alternative to setkey(DT) is returning a new ordered result using DT[order(col1, col2, ...), ].

Best, Matt

@mattdowle
Copy link
Member Author

chicane v0.1.7                                                  2021-09-20
--------------------------------------------------
* enabling future compatability with data.table v1.14.2 by changing syntax
 names(data.table)['x' == names(data.table)] <- 'y' to setnames(data.table, 'x', 'y')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants