Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setDT stop on matrix-column downgraded to warning #3851

Merged
merged 6 commits into from
Sep 10, 2019
Merged

Conversation

mattdowle
Copy link
Member

@mattdowle mattdowle commented Sep 9, 2019

Follow up to PR #3770
Resolves miceFast, MultiFit, sensobol, and genomic.autocorr in #3581

@mattdowle mattdowle added the WIP label Sep 9, 2019
@mattdowle mattdowle added this to the 1.12.4 milestone Sep 9, 2019
@codecov
Copy link

codecov bot commented Sep 9, 2019

Codecov Report

Merging #3851 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3851      +/-   ##
==========================================
+ Coverage   99.42%   99.42%   +<.01%     
==========================================
  Files          71       71              
  Lines       13409    13412       +3     
==========================================
+ Hits        13332    13335       +3     
  Misses         77       77
Impacted Files Coverage Δ
src/assign.c 100% <ø> (ø) ⬆️
R/print.data.table.R 100% <100%> (ø) ⬆️
R/as.data.table.R 100% <100%> (ø) ⬆️
R/data.table.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bb130d...b429c12. Read the comment docs.

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mattdowle mattdowle mentioned this pull request Sep 9, 2019
31 tasks
@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 10, 2019

genomic.autocorr also related:

library(genomic.autocorr)
library(data.table)
data <- genomic.autocorr:::.sim.data() 
summ <- acf.summary(data,c("x","y0","y1"),lag.max=20)

^ acf.summary internally runs:

x = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), .Dim = c(20L, 1L, 1L))
y = c(0.901726200391654, 0.803452400783308, 0.705178601174963, 0.606904801566618, 
0.508631001958274, 0.410357202349931, 0.312083402741587, 0.213809603133239, 
0.115535803524897, 0.0172620039165525, 0.0108828058125369, 0.00450360770852142, 
-0.00187559039549395, -0.0082547884995096, -0.0146339866035251, 
-0.0210131847075406, -0.0273923828115558, -0.0337715809155713, 
-0.0401507790195863, -0.0465299771236016)
z = structure(c(0.901726200391654, -0.0516761033150342, -0.0544920395823338, 
-0.0576325550535451, -0.0611572007952979, -0.0651410447490315, 
-0.0696800778161759, -0.0748990494072447, -0.0809631093333218, 
-0.0880956032946936, 0.416105183431417, -0.0363239727727071, 
-0.0376931372644017, -0.0391695609000097, -0.0407663613745311, 
-0.0424988863327886, -0.0443852082532045, -0.0464467572462868, 
-0.048709138791348, -0.0512032026981874), .Dim = c(20L, 1L, 1L
))
data.table(x, y, z)

which triggers the setDT error -- the is.matrix line in as.data.table.list misses the array input before the output is setDT. But if we switch is.matrix to is.array, as.data.table.array is dispatched which creates new columns for the (essentially redundant) array indices.

I think we'll have to condition on length(col_with_dim) in order not to break this -- ideally downstream would be doing drop(x) and drop(z). There's an

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 10, 2019

@mattdowle have a look at latest commit.

Basically want to allow

setDT(list(matrix(1:5, nrow = 5L), 5:1)) to work.

Unfortunately address(x) != address(drop(x))...

An alternative would be setattr(x, '.Dim', NULL) instead?

w the latest commit, tools::testInstalledPackage('genomic.autocorr') works for me, as does example('acf.summary')

Michael Chirico and others added 3 commits September 10, 2019 14:56
@mattdowle mattdowle removed the WIP label Sep 10, 2019
@mattdowle
Copy link
Member Author

mattdowle commented Sep 10, 2019

Not sure what was happening relating to the array aspect of genomic.autocorr. I had approached getting miceFast to pass by making setDT(list(matrix)) work again, since miceFast does DT[, .(VIF(...))] where its VIF() returns a one-column matrix which setDT(jval) then turns into a one-column data.table. With that change, genomic.autocorr started to pass too. Maybe there was a root cause earlier in that package that made it look like it was array related later on, but the list(matrix) solves now earlier. Anyway, we can always come back to it, but for now, the 4 revdeps are passing, which is the main thing.

@mattdowle mattdowle merged commit 8bda20b into master Sep 10, 2019
@mattdowle mattdowle deleted the setDT_matrix_col branch September 10, 2019 17:43
if (length(x)>1L) {
idx = vapply_1i(x, function(xi) length(dim(xi)))>1L
if (any(idx))
warning("Some columns are a multi-column type (such as a matrix column): ", brackify(which(idx)),". setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me it sounds better to raise error, and ask revdep maintainer for update to as.data.table

Copy link
Member Author

@mattdowle mattdowle Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but revdeps are just a proxy for the (larger) outside-CRAN usage. The goal is to minimize required-communication with revdep maintainers so that we communicate changes via the messages and warnings wherever possible. In this case, there is no change to behavior but we went from nothing straight to error. The warning is a softer first step. That's my current thinking anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also was a bit hesitant about warning initially, but actually you can get this error from data.table() and as.data.table() since both call as.data.table.list which works by running setDT at the end.

in the case of data.table(1D-array) it would require overhauling how as.data.table.array works or building a new branch in as.data.table.list for the vector-embedded-in-higher-dimensions case

(which maybe we could/should do but perhaps file for another day?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants