Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #1611] Allow 2 column i matrix to return a list() (or vector if same type) #657

Open
arunsrinivasan opened this issue Jun 8, 2014 · 22 comments
Labels
feature request top request One of our most-requested issues

Comments

@arunsrinivasan
Copy link
Member

arunsrinivasan commented Jun 8, 2014

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

Realised as result of fixing bug #1593 in 1.6.7. See error message in [.data.table which refers to this FR.

@drewgendreau

This comment has been minimized.

@jangorecki
Copy link
Member

related(?) to #826 where 826 is just a special case for second dimension length equal to 1.

@elad663

This comment has been minimized.

@cmdcolin
Copy link

I was getting an error that referred here and I also looked at #826

I get this error with the sample code

x=data.table(a=1:10,b=2:11)
subset(x,(x[,1]>5))

The workaround to do subset(x,(x[,1]>5)[,1]) does work but curious if it is necessary? using data.table 1.10.4

@FarzanT

This comment has been minimized.

@kippjohnson
Copy link

Huge data.table fan, thank you so much for all of your work.

Would just like to add that this feature would be great (subsetting a DT by a 2 column matrix). It's a nice feature because you can then easily do something like the following:

dt[which( your_condition, arr.ind = T)]

and get back a data table with the desired rows/columns.

@shaabhishek

This comment has been minimized.

@elad663

This comment has been minimized.

@LannyFox

This comment has been minimized.

@jaapwalhout
Copy link

jaapwalhout commented Sep 21, 2018

Another use case with .SD. Now I need to resort to as.matrix or as.data.frame to be able to perform subsetting with 2 column matrix objects.

@MichaelChirico
Copy link
Member

Not sure I understand the use case for this feature? Any simple example?

@kippjohnson
Copy link

Not sure I understand the use case for this feature? Any simple example?

dt[which( your_condition, arr.ind = T)]

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 25, 2018

@kippjohnson I saw that... not really helping... How is this different from dt[(your_condition)]?

The use case I requested would flesh out a full reproducible example :)

@franknarf1
Copy link
Contributor

@kippjohnson Fyi, you said above "and get back a data table with the desired rows/columns.", however, the proposal is for a list or vector of values from the selected positions, not a data.table (since the latter does not make sense for this sort of indexing, just as it doesn't in matrices, which extract a vector when using X[Y])

@kippjohnson
Copy link

kippjohnson commented Sep 25, 2018

ing... How is this different from dt[(your_condition)]?

This doesn't work if (your_condition) yields a matrix of rows x column indices.

Sure, sometimes you can re-do it. But why does data.table allow you to select by row index if we could just use conditions? I don't see why a reproducible code example is necessary here. Sometimes you have computed matrix positions in another step. This doesn't exist in base R for no reason :-)

Frank: yes, my mistake, return a list or a vector as in base R. Maybe I was thinking a 1-column DT.

@franknarf1
Copy link
Contributor

Fwiw, yeah, I have no idea what an example would look like here. Re "easy filling of NA values after a join", I don't see how this helps. Current usage...

library(data.table)
DT1 = data.table(id = 1:2, a = 1, b = 2)
DT2 = data.table(id = 3:4, x = 1, y = 2)

DT = merge(DT1, DT2, by="id")
DT[is.na(DT)] <- 99

There's no sensible LHS to put into DT[which(is.na(DT), arr.ind=TRUE), (LHS) := 99] here.

@kippjohnson Reproducible examples are recommended for any sort of post. In the context of a feature request like this, I guess you want to provide it so the developers can make sure they understand the problem and make a fix that resolves it (eg, they might take your example and put it into a unit test to make sure it doesn't break later).

@kippjohnson
Copy link

kippjohnson commented Sep 25, 2018

Sure, a minimal reproducible example.

# Base R
set.seed(123)
x0 <- matrix(rnorm(100), 10, 10)
y0 <- which(x0>1.5, arr.ind=TRUE)
x0[y0]

# data.table
library(data.table)
x1 <- as.data.table(x0)
x1[y0] # not implemented

Of course there are other ways to do this pretty trivial example. I'm not sure what would be the fastest way in data.table.

However, when you have a matrix with i,j positions that may have been more expensive to compute, if you want to do this you need to change the data.table back to a matrix or dataframe.

@MichaelChirico
Copy link
Member

@kippjohnson thanks, makes sense now. basically situations where some subset of data.table columns can be considered as a matrix (but are stored as a data.table alongside other useful non-matrix columns). Then @jaapwalhout 's example is to declare which are the "matrix columns" inside .SDcols.

I'm not sure about efficiency (since data.table/data.frame are columnar storage) but an implementation that simply does matrix conversion under the hood would be relatively easy to cobble together...

@tungttnguyen
Copy link

Can anyone explain to me the reason why dt[dt < 0] throws an error while dt[dt < 0] <- 1 works in the example below? Thanks!

### goal: replace any value small than 0 with 1

library(data.table)

txt <- "V1   V2  V3
0   -999  0
-999   0  -999"

dt <- fread(txt)
dt
#>      V1   V2   V3
#> 1:    0 -999    0
#> 2: -999    0 -999

# error
dt[dt < 0]
#> Error in `[.data.table`(dt, dt < 0): i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657.

# works
dt[dt < 0] <-  1
dt
#>    V1 V2 V3
#> 1:  0  1  0
#> 2:  1  0  1

@lbearup

This comment has been minimized.

@lcorag

This comment has been minimized.

@MichaelChirico
Copy link
Member

@drewgendreau @elad663 @shaabhishek @lbearup @Laluke91 please note that we are tracking popular issues through the use of 👍 on the issue itself, please do add such reaction to increase visibility of this issue as we prioritize our time for clearing the issue stack 😄

@jangorecki jangorecki added top request One of our most-requested issues and removed High labels Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request top request One of our most-requested issues
Projects
None yet
Development

No branches or pull requests