Skip to content

Commit

Permalink
Closes #768. fread() gains col.names argument.
Browse files Browse the repository at this point in the history
  • Loading branch information
arunsrinivasan committed Sep 16, 2015
1 parent b84527d commit e3fd580
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 3 deletions.
5 changes: 4 additions & 1 deletion R/fread.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.strings="NA",stringsAsFactors=FALSE,verbose=getOption("datatable.verbose"),autostart=1L,skip=0L,select=NULL,drop=NULL,colClasses=NULL,integer64=getOption("datatable.integer64"),dec=if (sep!=".") "." else ",", check.names=FALSE, encoding="unknown", strip.white=TRUE, showProgress=getOption("datatable.showProgress"),data.table=getOption("datatable.fread.datatable")) {
fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.strings="NA",stringsAsFactors=FALSE,verbose=getOption("datatable.verbose"),autostart=1L,skip=0L,select=NULL,drop=NULL,colClasses=NULL,integer64=getOption("datatable.integer64"),dec=if (sep!=".") "." else ",", col.names, check.names=FALSE, encoding="unknown", strip.white=TRUE, showProgress=getOption("datatable.showProgress"),data.table=getOption("datatable.fread.datatable")) {
if (!is.character(dec) || length(dec)!=1L || nchar(dec)!=1) stop("dec must be a single character e.g. '.' or ','")
# handle encoding, #563
if (!encoding %in% c("unknown", "UTF-8", "Latin-1")) {
Expand Down Expand Up @@ -114,5 +114,8 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
set(ans, j = j, value = as_factor(.subset2(ans, j)))
}
}
# FR #768
if (!missing(col.names))
setnames(ans, col.names) # setnames checks and errors automatically
ans
}
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@

30. `[.data.table` now accepts single column numeric matrix in `i` argument the same way as `data.frame`. Closes [#826](https://github.com/Rdatatable/data.table/issues/826). Thanks to @jangorecki for the PR.

31. `fread` gains `col.names` argument, and is similar to `base::read.table()`. Closes [#768](https://github.com/Rdatatable/data.table/issues/768). Thanks to @dardesta for filing the FR.

#### BUG FIXES

1. `if (TRUE) DT[,LHS:=RHS]` no longer prints, [#869](https://github.com/Rdatatable/data.table/issues/869) and [#1122](https://github.com/Rdatatable/data.table/issues/1122). Tests added. To get this to work we've had to live with one downside: if a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` or `print(DT)` is typed at the prompt, nothing will be printed. A repeated `DT` or `print(DT)` will print. To avoid this: include a `DT[]` after the last `:=` in your function. If that is not possible (e.g., it's not a function you can change) then `DT[]` at the prompt is guaranteed to print. As before, adding an extra `[]` on the end of a `:=` query is a recommended idiom to update and then print; e.g. `> DT[,foo:=3L][]`. Thanks to Jureiss and Jan Gorecki for reporting.
Expand Down
8 changes: 8 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -6931,6 +6931,14 @@ test(1555.13, fread(str1), fread(str2))
test(1556.1, fread("issue_1330_fread.txt", nrow=2), data.table(a=1:2, b=1:2))
test(1556.2, fread("issue_1330_fread.txt", nrow=4), data.table(a=1:2, b=1:2), warning="Stopped reading at empty line 4")

# FR #768
str="1,2\n3,4\n"
test(1557.1, names(fread(str)), c("V1", "V2")) # autonamed
test(1557.2, names(fread(str, col.names=letters[1:2])), letters[1:2])
test(1557.3, names(fread(str, col.names=letters[1])), error="Can't assign 1 names to")
test(1557.4, names(fread(str, col.names=letters[1:3])), error="Can't assign 3 names to")
test(1557.5, names(fread(str, col.names=1:2)), error="Passed a vector of type")

##########################


Expand Down
5 changes: 3 additions & 2 deletions man/fread.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ fread(input, sep="auto", sep2="auto", nrows=-1L, header="auto", na.strings="NA",
stringsAsFactors=FALSE, verbose=getOption("datatable.verbose"), autostart=1L,
skip=0L, select=NULL, drop=NULL, colClasses=NULL,
integer64=getOption("datatable.integer64"), # default: "integer64"
dec=if (sep!=".") "." else ",",
dec=if (sep!=".") "." else ",", col.names,
check.names=FALSE, encoding="unknown", strip.white=TRUE,
showProgress=getOption("datatable.showProgress"), # default: TRUE
data.table=getOption("datatable.fread.datatable") # default: TRUE
Expand All @@ -35,6 +35,7 @@ data.table=getOption("datatable.fread.datatable") # default: TRUE
\item{colClasses}{ A character vector of classes (named or unnamed), as read.csv. Or a named list of vectors of column names or numbers, see examples. colClasses in fread is intended for rare overrides, not for routine use. fread will only promote a column to a higher type if colClasses requests it. It won't downgrade a column to a lower type since NAs would result. You have to coerce such columns afterwards yourself, if you really require data loss. }
\item{integer64}{ "integer64" (default) reads columns detected as containing integers larger than 2^31 as type \code{bit64::integer64}. Alternatively, \code{"double"|"numeric"} reads as \code{base::read.csv} does; i.e., possibly with loss of precision and if so silently. Or, "character". }
\item{dec}{ The decimal separator as in \code{base::read.csv}. If not "." (default) then usually ",". See details. }
\item{col.names}{ A vector of optional names for the variables (columns). The default is to use the header column if present or detected, or if not "V" followed by the column number. }
\item{check.names}{ default is \code{FALSE}. If \code{TRUE}, it uses the base function \code{\link{make.unique}} to ensure that column names are all unique.}
\item{encoding}{ default is \code{"unknown"}. Other possible options are \code{"UTF-8"} and \code{"Latin-1"}. }
\item{strip.white}{ default is \code{TRUE}. Strips leading and trailing whitespaces of unquoted fields. If \code{FALSE}, only header trailing spaces are removed. }
Expand Down Expand Up @@ -71,7 +72,7 @@ On Windows, "French_France.1252" is tried which should be available as standard
If an embedded quote is followed by the separator inside a quoted field, the embedded quotes up to that point in that field must be balanced; e.g. \code{...,2,"www.blah?x="one",y="two"",3.14,...}.
Quoting may be used to signify that numeric data should be read as text.
On those fields that do not satisfy these conditions, e.g., fields with unbalanced quotes, \code{fread} re-attempts that field as if it isn't quoted. This is quite useful in reading files that contains fields with unbalanced quotes as well, automatically.}
}

Expand Down

0 comments on commit e3fd580

Please sign in to comment.