Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomatch=0 fires incorrect (?) warning #2399

Closed
kodonnell opened this issue Oct 4, 2017 · 3 comments
Closed

nomatch=0 fires incorrect (?) warning #2399

kodonnell opened this issue Oct 4, 2017 · 3 comments

Comments

@kodonnell
Copy link

Consider the following

> dt <- data.table(a=1:4, b=LETTERS[1:4])
> setkey(dt, a, b)
> dt[.(1:2, LETTERS[1:3]), nomatch=NA]
   a b
1: 1 A
2: 2 B
3: 1 C
Warning message:
In as.data.table.list(i) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving a remainder of 1 items)

This is a reasonable warning - as we can see, the first column has been recycled. However:

> dt[.(1:2, LETTERS[1:3]), nomatch=0]
   a b
1: 1 A
2: 2 B
Warning message:
In as.data.table.list(i) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving a remainder of 1 items)

I don't think this is reasonable - since we've specified nomatch=0 there should be no issue. This is a toy example, but the use case is to return all rows which contain a value in some candidate vector (here 1:2 and LETTERS[1:3]) i.e. a %in% 1:2 & b %in% LETTERS[1:3].

As a separate aside dt[.(1:2, LETTERS[1:4]), nomatch=0] doesn't throw an error, because (I guess) it deems that recycling is expected when lengths are multiples (as per normal R behaviour). However, it means the appearance of this bug depends on the length of the candidate vectors I provide (specifically, whether the smaller is a divisor of the other), which is not intuitive for this use case: if I know column a only contains 1:2 then I should be able to specify 1:2 or 1:12345 as my candidate vector, and get the same result. This isn't the case here.

Again - it makes sense with nomatch=NA, but not nomatch=0. I'm not sure how easy it is to fix this ... I suspect it's just internal reordering of code so that nomatch=0 gets implemented before passing to as.data.table.list.

@franknarf1
Copy link
Contributor

This is a toy example, but the use case is to return all rows which contain a value in some candidate vector (here 1:2 and LETTERS[1:3]) i.e. a %in% 1:2 & b %in% LETTERS[1:3].

No, that is not what the join is meant to do. A join x[i] will take each row of i and look up corresponding rows in x. While it is possible to pass i as a list instead of a table, this is just a syntactical convenience and doesn't imply doing a different join than you'd get if you actually did

m = as.data.table(list(1:2, LETTERS[1:3]))
dt[m, nomatch=0]

Currently, there is no join/index/"bmerge" functionality for dt[a %in% 1:2 & b %in% LETTERS[1:3]], but Arun has filed it: #1453 (if I understand correctly).

@kodonnell
Copy link
Author

A join x[i] will take each row of i and look up corresponding rows in x

But in the case of nomatch=0L, the resulting table should not involve any recycling, and hence the warning message should not appear. Correct?

@jangorecki
Copy link
Member

Warning comes not from nomatch=0 but from i argument being evaluated. If you just copy content of i and replace .() into data.table() you will get the same warning.

> library(data.table)
data.table 1.11.5 IN DEVELOPMENT built 2018-09-10 04:40:14 UTC; jan  Latest news: r-datatable.com
> dt <- data.table(a=1:4, b=LETTERS[1:4])
> setkey(dt, a, b)
> data.table(1:2, LETTERS[1:3])
   V1 V2
1:  1  A
2:  2  B
3:  1  C
Warning message:
In data.table(1:2, LETTERS[1:3]) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving remainder of 1 items)

Thus recycling does not occur in resulting table but in input table that you want to join to. Closing then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants