Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread fails if whitespace before first character #1035

Closed
dpastoor opened this issue Feb 7, 2015 · 4 comments
Closed

fread fails if whitespace before first character #1035

dpastoor opened this issue Feb 7, 2015 · 4 comments
Assignees
Milestone

Comments

@dpastoor
Copy link

dpastoor commented Feb 7, 2015

The program I use by default gives output like so:

" ITERATION    THETA1       THETA2"                
"            0  3.95527E+01  2.10651E+01"

I first tried with the stable version (1.9.4) and got the following error:

Expected sep (',') but new line, EOF (or other non printing character) ends field 1 on line 4 when detecting types:             0  3.95527E+01  2.10651E+01 

with the development version (1.9.5) I get the following error:

Not positioned correctly after testing format of header row. ch=' '

This seems like it would be resolved by #758 or #558 (variable whitespace delimiter)

but just wanted to give another example of a situation where fread currently fails and need to fall back to read.table(sep="") to get the table to read in properly

@GHarmata
Copy link

GHarmata commented Jun 9, 2015

I am experiencing a similar problem; however, the R session completely aborts in 1.9.4, instead of giving a warning message.

Here is an example illustrating the format of dataset with which I am working:

"  22 4 6 4" 
"  34 22 34 5"
"  6 2 1 4"

Each row begins with two white spaces, but the separator for the rest of the values is a single white space.

In version 1.9.4, when I attempted to use fread() in RStudio with my much-larger dataset, I received the following pop-up warning (and the session had to be restarted):

R Session Aborted
R encountered a fatal error. The session was terminated.

Using version 1.9.5, I instead received this error message:

Not positioned correctly after testing format of header row. ch=' '

I found this output very confusing, since I had specified that there was no header row (header = FALSE). The verbose=TRUE version of this output is below.

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.024641 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ' '
Detected 561 columns. Longest stretch was from line 1 to line 30
Starting data input on line 1 (either column names or first row of data). First 10 characters:   2.571777
Error in fread("UCI HAR Dataset/test/X_test.txt", header = FALSE, verbose = TRUE) : 
  Not positioned correctly after testing format of header row. ch=' '

I am using R version 3.20 ("Full of Ingredients"), with RStudio version 0.98.1103.

@jaapwalhout
Copy link

I had the same problem. This works:

 df <- read.table("./data/train/X_train.txt")

but reading the same file with fread does not work and throws me an error:

> dt <- fread("./data/train/X_train.txt")
Error in fread("./data/train/X_train.txt") : 
  Not positioned correctly after testing format of header row. ch=' '

The file in question can be downloaded from here (66MB's).

@arunsrinivasan
Copy link
Member

Fixed in devel. Please upgrade and test.

@jaapwalhout
Copy link

@arunsrinivasan Tested. It now works on the file I specified above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants