Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread parse standard data.table print output #2720

Closed
jangorecki opened this issue Apr 1, 2018 · 3 comments
Closed

fread parse standard data.table print output #2720

jangorecki opened this issue Apr 1, 2018 · 3 comments

Comments

@jangorecki
Copy link
Member

I would find it convenient to just copy-paste data.table as printed to output.

      ID  time nTest
 1: 1650 6.714    91
 2: 1652 6.376    91
 3: 1648 6.338    91
 4: 1509 5.446     1
 5: 1779 4.541    13
 6: 1646 4.060    91
 7: 1644 3.572    91
 8: 1642 3.416    91
 9: 1437 2.261   528
10: 1151 1.828     2

This text copy-pasted from output could be just put in fread("[ctrl+v]"). Something like checking 1: in first col. Makes easier to get some small data.table copy-pasted from Rout log.

@MichaelChirico
Copy link
Member

MichaelChirico commented Apr 2, 2018

How sophisticated do you want it to be?

An extremely rinky-dink version is:

fread_rout = function(x) {
  # copy-pasted string into character vector
  x = strsplit(x, '\n', fixed = TRUE)[[1L]]
  # trim whitespace
  x = sub('^[ \t]+', '', sub('[ \t]+$', '', x))
  # if output was trimmed, drop the '---' row
  x = grep('---', x, fixed = TRUE, invert = TRUE, value = TRUE)
  # get column names
  coln = strsplit(x[1L], '\\s+')[[1L]]
  # split rows by spaces (fails with spaced character column), turn into columns
  x = transpose(strsplit(x[-1L], '\\s+'))[-1L]
  setDT(x)
  setnames(x, coln)
  # attempt to apply as.numeric
  x[ , (coln) := lapply(.SD, function(x) {
    num = suppressWarnings(as.numeric(x))
    if (any(is.na(num[x != 'NA']))) return(x)
    return(num)
  })]
  # force print
  x[]
}

It won't be able to handle anything weird (embedded commas, spaces in strings, etc). But I guess that's not the use case here.

@MichaelChirico
Copy link
Member

we could just add this into the text argument's functionality perhaps?

@jangorecki
Copy link
Member Author

I would close this issue, because the idea was not that smart, printing formats depends on my factors. It is easy to do function like this for the most common examples, but difficult to make it work more widely, potentially bringing many reports about something being not well parsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants