Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR for fread: if select is used, colClasses need only correspond to the columns in select #1426

Closed
MichaelChirico opened this issue Nov 10, 2015 · 4 comments · Fixed by #3547

Comments

@MichaelChirico
Copy link
Member

I never filed a FR for a question I raised a year ago on SO.

The current canon for using select and colClasses simultaneously is (IMO) unwieldy.

Consider:

#file to read
ffile <- paste0(paste(paste0("V", 1:20), collapse = ","),
              "\na,b,c,d,e,1,2,3,4,5,1.1,1.2,1.3,1.4,1.5,",
              "TRUE,FALSE,TRUE,FALSE,TRUE")

#columns to take
sel <- c(2, 10, 13:15, 20)
#types of all columns
tps <- rep(c("character", "integer",
           "numeric", "logical"),
         rep(5, 4))

Here's the best I could come up with as a programmatic way to use fread:

DT <- fread(ffile, select = paste0("V", sel),
            colClasses =
              sapply(unique(tps[sel]),
                     function(x) paste0("V", sel[which(tps[sel] == x)])))

(gross; could be spelled out explicitly as the following, but this is generally unsatisfying:)

DT <- fread(ffile, select = paste0("V", sel),
            colClasses =
              list(character = "V1", integer = "V10",
                   numeric = c("V13","V14","V15"),
                   logical = c("V20")))

To me it would make much more sense to be able to simply write:

DT <- fread(ffile, select = paste0("V", sel), colClasses = tps[sel])

But this currently produces the error:

Error in fread(ffile, select = paste0("V", sel), colClasses = tps[sel]) :
colClasses is unnamed and length 6 but there are 20 columns. See ?data.table for colClasses usage.

Parsimonious, and as far as I can tell unambiguous. Any reason why this wouldn't work?

@MichaelChirico
Copy link
Member Author

What about just reversing the order of dealing with select vs. colClasses -- switch Lines #1141 - 1157 with Lines #1059 - 1112, and reset ncol to length(select) within the select condition?

@dselivanov
Copy link

+1 for this, also annoying me. Will try to check.

@arunsrinivasan
Copy link
Member

Quite a few issues marked for v1.9.8 already. Can't take a look anytime soon. Glad if you could look into this. Thanks.

@renkun-ken
Copy link
Member

Really need this.

@mattdowle mattdowle removed this from the Candidate milestone May 10, 2018
@mattdowle mattdowle added this to the 1.12.4 milestone May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants