Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foverlaps doesn't catch duplicate column names. Fails with hard to understand error. #1730

Closed
rodonn opened this issue Jun 7, 2016 · 2 comments
Assignees

Comments

@rodonn
Copy link

rodonn commented Jun 7, 2016

Code to reproduce:

require(data.table)
a = data.table(start = 1:5, end = 2:6, c2 = rnorm(100), c2 = rnorm(100))
b = data.table(start = 1:5, end = 2:6, c3 = rnorm(10))

setkey(a, start, end)
setkey(b, start, end)

foverlaps(a, b)

Error is Error in setcolorder(ans, c(xcols1, ycols, xcols2)) : neworder is length 6 but x has 7 columns.

I think the problem is that foverlap checks

if (anyDuplicated(by.x) || anyDuplicated(by.y)) 
    stop("Duplicate columns are not allowed in overlap joins. This may change in the future.")

whereas merge checks

if (any(duplicated(names(x)))) stop("x has some duplicated column name(s): ",paste(names(x)[duplicated(names(x))],collapse=","),". Please remove or rename the duplicate(s) and try again.")
if (any(duplicated(names(y)))) stop("y has some duplicated column name(s): ",paste(names(y)[duplicated(names(y))],collapse=","),". Please remove or rename the duplicate(s) and try again.")

Foverlaps only throws an error when by.x or by.y have duplicates, but will throw an error if any of the columns in x are duplicated (i.e. regardless of whether or not they are in by.x)

@rodonn
Copy link
Author

rodonn commented Jun 7, 2016

The error only seems to happen if x has duplicated column names, duplicates in y don't seem to cause the same problem. The following does not throw any errors.

a = data.table(start = 1:5, end = 2:6, c2 = rnorm(100), c4 = rnorm(100))
b = data.table(start = 1:5, end = 2:6, c3 = rnorm(10), c3 = rnorm(10))

setkey(a, start, end)
setkey(b, start, end)

foverlaps(a, b)

@jangorecki jangorecki self-assigned this Jun 7, 2016
@jangorecki
Copy link
Member

jangorecki commented Jun 7, 2016

Thanks for reporting. Join is performed correctly, just reordering columns in the results failed to handle duplicates. Workarounds of setcolorder (in multiple places here) would likely results into more issues in future, so meaningful error looks to be better solution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants