Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.data.table which argument could accept integer #3736

Open
jangorecki opened this issue Jul 31, 2019 · 7 comments
Open

[.data.table which argument could accept integer #3736

jangorecki opened this issue Jul 31, 2019 · 7 comments

Comments

@jangorecki
Copy link
Member

jangorecki commented Jul 31, 2019

Currently which is guaranteed to be logical length 1.

if (!is.logical(which) || length(which)>1L) stop("which= must be a logical vector length 1. Either FALSE, TRUE or NA.")

If we will accept integer to be supplied to which argument (AND missing(i)) we could easily invert its functionality to subset data.table by integer row index. That would efficiently address #3735, see benchmark there (12.7s vs 3.5s).
Example usage

d = data.table(a=6:10, b=11:15)
d[which=2:3]
# faster equivalent to
d[2:3]
#       a     b
#1:     7    12
#2:     8    13

I am aware it is better to optimise i=2:3 but because of NSE processing it is very tricky to optimise that, and probably will never be as fast as non-NSE which argument.

@st-pasha
Copy link
Contributor

But can't you first check the value of i (before calling substitute/deparse on any arguments), and if it is integer-like, and no other arguments present then quickly return with a subsetDT?

@franknarf1
Copy link
Contributor

I don't understand the use case. For the benchmarks in #3735 I see a large improvement with dt1[, do(.SD), by=1:n].

If NSE overhead is too high, maybe let the user explicitly indicate how they expect i to be interpreted:

DT[ .row(2:3) ]

# similarly
DT[ .join(DT2) ]
DT[ .use_index(x %in% 5:6 & y == TRUE & z >= 3) ]

The explicitly named functions might also help with folks learning DT[...] and having trouble with the NSE behind the terse syntax and/or not realizing that the multiple uses of i are available.

I suggested a helper DT[ subset.join(DT2) ] in #2158 . That would differ from these in that it would extend functionality rather than just removing NSE ambiguity.

@chnynf
Copy link

chnynf commented Aug 1, 2019

Currently when users write "with = FALSE", they're able to select columns named in variable, making the data.table behave similar to a data.frame.
Is it possible to extend this feature, so that whenever users put in "with = FALSE", disable all the overhead checking, and restore the syntax to data.frame mode.
Downside is that the benefits of DT on i and j must be on or off together.

@franknarf1
Copy link
Contributor

@chnynf Also about changes to with: #2826

Re "on or off together", the syntax could be changed to have a length-two vector (first for i, then for j), with=c(TRUE, FALSE) maybe (not sure about the internals).

@jangorecki
Copy link
Member Author

jangorecki commented Aug 6, 2019

@st-pasha

But can't you first check the value of i (before calling substitute/deparse on any arguments), and if it is integer-like, and no other arguments present then quickly return with a subsetDT?

Expression passed to i might result in error when not being pre-processed, for example substitute .() with list() in DT[.(a=5L), on="a"]. It might also be expensive to compute, like DT[order(a)], where we want to substitute order to forder. We could eventually wrap it in try... but it only complicates the code for not such a big gain. Ideally we want to go into more modular code, where such separation seems to fit.

@jangorecki
Copy link
Member Author

jangorecki commented Aug 6, 2019

@chnynf note this issue is about which argument, not with argument. Your comment seems unrelated to discussion here (aside from an NSE overhead), please open new issue if needed.

@jangorecki
Copy link
Member Author

this request could be superseded by #4485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants