Skip to content

Commit

Permalink
indices can now return list of character vectors, closes #1589
Browse files Browse the repository at this point in the history
  • Loading branch information
jangorecki committed Nov 30, 2016
1 parent 5f9aec7 commit 3aa65e2
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 3 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@

### Changes in v1.9.9 ( in development on GitHub )

#### NEW FEATURES

1. `indices()` function gain new argument `vectors` default `FALSE`, when `TRUE` provided then list of vector is returned, single vector refers to single index. Closes #1589.

#### BUG FIXES

1. `fwrite(..., quote='auto')` already quoted a field if it contained a `sep` or `\n`, or `sep2[2]` when `list` columns are present. Now it also quotes a field if it contains a double quote (`"`) as documented, [#1925](https://github.com/Rdatatable/data.table/issues/1925). Thanks to Aki Matsuo for reporting. Tests added. The `qmethod` tests did test escaping embedded double quotes, but only when `sep` or `\n` was present in the field as well to trigger the quoting of the field.
Expand Down
7 changes: 5 additions & 2 deletions R/setkey.R
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,13 @@ key2 <- function(x) {
if (is.null(ans)) return(ans) # otherwise character() gets returned by next line
gsub("^__","",ans)
}
indices <- function(x) {
indices <- function(x, vectors = FALSE) {
ans = names(attributes(attr(x,"index",exact=TRUE)))
if (is.null(ans)) return(ans) # otherwise character() gets returned by next line
gsub("^__","",ans)
ans <- gsub("^__","",ans)
if (isTRUE(vectors))
ans <- strsplit(ans, "__", fixed = TRUE)
ans
}

get2key <- function(x, col) attr(attr(x,"index",exact=TRUE),paste("__",col,sep=""),exact=TRUE) # work in progress, not yet exported
Expand Down
9 changes: 9 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -9737,6 +9737,15 @@ test(1746.2, DT[ids, print(A), by=.EACHI, nomatch=0],
test(1746.3, DT[ids, {print(A);A}, by=.EACHI, nomatch=0], # reliable crash in v1.9.6 and v1.9.8
data.table(A=c("c","d"),V1=c("c","d")), output="\"c\".*\"d\"")

# indices() can return list of vectors, #1589
DT = data.table(A=5:1,B=letters[5:1])
setindex(DT)
setindex(DT, A)
setindex(DT, B)
indices(DT, vectors = TRUE)
test(1747.1, indices(DT), c("A__B","A","B"))
test(1747.2, indices(DT, vectors = TRUE), list(c("A","B"),"A","B"))


##########################

Expand Down
11 changes: 10 additions & 1 deletion man/setkey.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ setkeyv(x, cols, verbose=getOption("datatable.verbose"), physical = TRUE)
setindex(...)
setindexv(...)
key(x)
indices(x)
indices(x, vectors = FALSE)
haskey(x)
key(x) <- value # DEPRECATED, please use setkey or setkeyv instead.
}
Expand All @@ -51,6 +51,8 @@ names.}
\item{verbose}{ Output status and information. }
\item{physical}{ TRUE changes the order of the data in RAM. FALSE adds a
secondary key a.k.a. index. }
\item{vectors}{ logical scalar default \code{FALSE}, when set to \code{TRUE}
then list of character vectors is returned, each vector refers to one index. }
}
\details{
\code{setkey} reorders (or sorts) the rows of a data.table by the columns
Expand Down Expand Up @@ -155,6 +157,13 @@ DT = data.table(A=5:1,B=letters[5:1])
DT2 = copy(DT) # explicit copy() needed to copy a data.table
setkey(DT2,B) # now just changes DT2
identical(DT,DT2) # FALSE. DT and DT2 are now different tables

DT = data.table(A=5:1,B=letters[5:1])
setindex(DT) # set indices
setindex(DT, A)
setindex(DT, B)
indices(DT) # get indices single vector
indices(DT, vectors = TRUE) # get indices list
}
\keyword{ data }

2 comments on commit 3aa65e2

@mattdowle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense, but the NEWS item could be better.
Suggest instead of :

indices() function gain new argument vectors default FALSE, when TRUE provided then list of vector is returned, single vector refers to single index. Closes #1589.

this :

indices() now returns a list of vectors of column names rather than a vector of internal squashed string ids, #1589.

The item number #1589 needs to be a link.
I'd lean towards not complicating the function with the 'vectors' argument. If anyone wants to squash the vectors they can themselves using sapply and paste.

@jangorecki
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdowle I did not want to change default behaviour, but if you are OK with it then I think we can change it. This was exported so it will be potentially breaking change for 1.10.2.

Please sign in to comment.