-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
as.data.table.array - convert multidimensional array into data.table #1418
Comments
already have it well managed in separate project. |
reopening as it is worth to improve, current state: library(data.table)
x = array(c(1, 0, 0, 2, 0, 0, 0, 3), dim=c(2, 2, 2))
as.data.frame(x)
# V1 V2 V3 V4
#1 1 0 0 0
#2 0 2 0 3
as.data.table(x)
# x
#1: 1
#2: 0
#3: 0
#4: 2
#5: 0
#6: 0
#7: 0
#8: 3 I would NOT aim for consistency to data.frame here as it doesn't really provide useful output for arrays. new.as.data.table.array = function(x) {
d = dim(x)
dn = dimnames(x)
if (is.null(dn)) dn = lapply(d, seq.int)
r = do.call(CJ, c(dn, list(sorted=TRUE, unique=TRUE)))
dim.cols = copy(names(r))
jj = as.call(list(
as.name(":="),
"value",
as.call(lapply(c("[","x", dim.cols), as.symbol)) # lookup to 'x' array for each row
)) # `:=`("value", x[V1, V2, V3])
r[, eval(jj), by=c(dim.cols)][]
}
new.as.data.table.array(x)
# V1 V2 V3 value
#1: 1 1 1 1
#2: 1 1 2 0
#3: 1 2 1 0
#4: 1 2 2 0
#5: 2 1 1 0
#6: 2 1 2 0
#7: 2 2 1 2
#8: 2 2 2 3 It would handle use case described in previous comments: set.seed(1)
# array
x = array(rnorm(8,10,5), rep(2,3), dimnames = list(color = c("green","red"), year = c("2014","2015"), country = c("UK","IN")))
x["green","2015",]
# UK IN
#17.55891 15.62465
x["green",c("2014","2015"),]
# country
#year UK IN
# 2014 12.87891 6.893797
# 2015 17.55891 15.624655
dt = new.as.data.table.array(x)
dt[J("green","2015")]
# color year country value
#1: green 2015 IN 15.62465
#2: green 2015 UK 17.55891
dt[J("green", c("2014","2015"))]
# color year country value
#1: green 2014 IN 6.893797
#2: green 2014 UK 12.878907
#3: green 2015 IN 15.624655
#4: green 2015 UK 17.558906 Any feedback to draft welcome. |
Maybe better naming than Like the idea though. |
@MichaelChirico Just pushed RC version so feedback on it is welcome, or some new tests, after a while I will rebase it to master. in summary:
@arunsrinivasan FYI: |
I guess we're saving the Also, it might be nice to have an option to generate more than one variable from this, e.g. for an This would make it more parallel to I think implementation is just a prudent use of |
@MichaelChirico the use case you are describing would be simply the case when Q dimension would be a measure type dimension. I'm not sure if we really need it, |
@jangorecki Perhaps you also want to consider a "wide" representation, as you would get if you did Also, with |
Will hold on with that. I don't see a big problem with dcast'ing measures as post-process, but.. |
@jangorecki I was also sharing it because it might be faster on larger arrays. Here's the rough version of the function I'm proposing: am_adt <- function(inarray) {
if (!is.array(inarray)) stop("input must be an array")
dims <- dim(inarray)
if (is.null(dimnames(inarray))) {
inarray <- provideDimnames(inarray, base = list(as.character(seq_len(max(dims)))))
}
FT <- if (any(class(inarray) %in% "ftable")) inarray else ftable(inarray)
out <- data.table(as.table(ftable(FT)))
nam <- names(out)[seq_along(dims)]
setorderv(out[, (nam) := lapply(.SD, type.convert), .SDcols = nam], nam)[]
} Here are a couple of large-ish arrays to test against. " dims <- c(10, 20, 50, 10, 10)
set.seed(1)
M <- `dim<-`(sample(100, prod(dims), TRUE), dims)
N <- `dimnames<-`(M, lapply(dims, function(x) c(letters, LETTERS)[seq_len(x)])) Wrapping your approach in Certainly, there is room for improvement. I'm not sure that I should always use By the way, regarding your comment about |
@mrdwab previous implementation was unnecessarily complex and inefficient. The one just pushed is much faster. Looking at |
new as.data.table.array, closes #1418
FR for multidimensional array conversion to data.table.
Logic behind conversion is to lookup value from array for each combination of dimensions. Rationale is not only the similar API on subset of array/data.table (see below examples) but the underlying organization of data. It basically reduce array dimensions to tabular structure keeping all the relations between dimensions and corresponding value of a measure - so losslessly.
Below solution is likely to be inefficient due to lookup value from array for each group. The
j
argument may looks scary but it simply builds following call.(value = x[color, year, country])
to perform subsetx
array for each group.update after merge: http://stackoverflow.com/questions/11141406/reshaping-an-array-to-data-frame
The text was updated successfully, but these errors were encountered: