-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread() warning message: Read less rows than were allocated #1239
Comments
The file is read completely and correctly. The issue is that Here's an example: require(data.table)
text="a,b\nqq,rr\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n\"bla,\n,\n,\nbla\",bla\n"
fread(text, verbose=TRUE)
Input contains a \n (or is ""). Taking this to be text input (not a filename)
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 2 columns. Longest stretch was from line 1 to line 27
Starting data input on line 1 (either column names or first row of data). First 10 characters: a,b
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 102 (including 1 at the end)
Count of sep: 101
nrow = MIN( nsep [101] / ncol [2] -1, neol [102] - nblank [1] ) = 101 # <~~~~~~
Type codes ( first 5 rows): 44
Type codes (+ middle 5 rows): 44
Type codes (+ last 5 rows): 44
Type codes: 44 (after applying colClasses and integer64)
Type codes: 44 (after applying drop or select (if supplied)
Allocating 2 column slots (2 - 0 dropped)
Read slightly fewer rows (26) than were allocated (101). # <~~~~~~~
0.000s ( 6%) Memory map (rerun may be quicker)
0.000s ( 23%) sep and header detection
0.000s ( 10%) Count rows (wc -l)
0.000s ( 53%) Column type detection (first, middle and last 5 rows)
0.000s ( 3%) Allocation of 26x2 result (xMB) in RAM
0.000s ( 1%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 4%) Changing na.strings to NA
0.000s Total
a b
1: qq rr
2: bla,\n,\n,\nbla bla
3: bla,\n,\n,\nbla bla
4: bla,\n,\n,\nbla bla
5: bla,\n,\n,\nbla bla
6: bla,\n,\n,\nbla bla
7: bla,\n,\n,\nbla bla
8: bla,\n,\n,\nbla bla
9: bla,\n,\n,\nbla bla
10: bla,\n,\n,\nbla bla
11: bla,\n,\n,\nbla bla
12: bla,\n,\n,\nbla bla
13: bla,\n,\n,\nbla bla
14: bla,\n,\n,\nbla bla
15: bla,\n,\n,\nbla bla
16: bla,\n,\n,\nbla bla
17: bla,\n,\n,\nbla bla
18: bla,\n,\n,\nbla bla
19: bla,\n,\n,\nbla bla
20: bla,\n,\n,\nbla bla
21: bla,\n,\n,\nbla bla
22: bla,\n,\n,\nbla bla
23: bla,\n,\n,\nbla bla
24: bla,\n,\n,\nbla bla
25: bla,\n,\n,\nbla bla
26: bla,\n,\n,\nbla bla
a b Note the lines highlighted with arrow where the number of lines are computed incorrectly. |
Closing as duplicate of #1116. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The data used that produced the following bug can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. I've tried creating smaller subsets of this data, but my attempts result in different errors that stem from issues within my methods of creating the data, not the data.table package (as best I can tell).
I switched to the development version 1.9.5 because I was having trouble using fread from 1.9.4 for a data set that contained embedded quotes in one of the columns. The development version produced the proper data.table, but did provide me with the following warning message:
Here is the code and output with
verbose = TRUE
:The text was updated successfully, but these errors were encountered: