You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've got a file I made and continued to add to. Unfortunately at one point I switched from writing 15 to 14 columns and kept using the same file.
I expect two things from this file: 1) when I use fread on it, it should fail, but with an error that informs about the inconsistent # of columns 2) when I use fill = TRUE, the read is successful.
Unfortunately neither are true:
library(data.table)
# data.table 1.10.5 IN DEVELOPMENT built 2017-07-11 18:43:20 UTC; travis
URL = paste0('https://gist.githubusercontent.com/MichaelChirico/',
'0f1a9ae0d419160ad8ef5b7ac5469336/raw/',
'db7936fafaf2602e03e657bbfc9e49dd526260af/bad_fill.csv')
x = fread(URL, verbose = TRUE)
# Input contains no \n. Taking this to be a filename to open
# [1] Check arguments
# Using 2 threads (omp_get_max_threads()=2, nth=2)
# NAstrings = [<<NA>>]
# None of the NAstrings look like numbers.
# [2] Opening the file
# Opening file /tmp/RtmpMqWBHa/filee366e213235
# File opened, size = 34.88MB (36578984 bytes).
# Memory mapping ... ok
# [3] Detect and skip BOM
# [4] Detect end-of-line character(s)
# Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
# [6] Skipping initial rows if needed
# Positioned on line 1 starting: <<train_set,delx,dely,alpha,eta,>>
# [7] Detect separator, quoting rule, and ncolumns
# Detecting sep ...
# sep=',' with 100 lines of 15 fields using quote rule 0
# Detected 15 columns on line 1. This line is either column names or first data row. Line starts as: <<train_set,delx,dely,alpha,eta,>>
# Quote rule picked = 0
# [8] Determine column names
# All the fields on line 1 are character fields. Treating as the column names.
# [9] Detect column types
# Number of sampling jump points = 101 because (36578905 bytes from row 1 to eof) / (2 * 15974 jump0size) == 1144
# Type codes (jump 000) : 655552525552255 Quote rule 0
Error in fread(URL, verbose = TRUE) : Could not find first good line start after jump point 73 when sampling.
x = fread(URL, fill = TRUE, verbose = TRUE)
# Input contains no \n. Taking this to be a filename to open
# [1] Check arguments
# Using 2 threads (omp_get_max_threads()=2, nth=2)
# NAstrings = [<<NA>>]
# None of the NAstrings look like numbers.
# [2] Opening the file
# Opening file /tmp/RtmpMqWBHa/filee361f04c03
# File opened, size = 34.88MB (36578984 bytes).
# Memory mapping ... ok
# [3] Detect and skip BOM
# [4] Detect end-of-line character(s)
# Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
# [6] Skipping initial rows if needed
# Positioned on line 1 starting: <<train_set,delx,dely,alpha,eta,>>
# [7] Detect separator, quoting rule, and ncolumns
# Detecting sep ...
# sep=',' with 100 lines of 15 fields using quote rule 0
# Detected 15 columns on line 1. This line is either column names or first data row. Line starts as: <<train_set,delx,dely,alpha,eta,>>
# Quote rule picked = 0
# fill=true and the most number of columns found is 15
# [8] Determine column names
# All the fields on line 1 are character fields. Treating as the column names.
# [9] Detect column types
# Number of sampling jump points = 101 because (36578905 bytes from row 1 to eof) / (2 * 15974 jump0size) == 1144
# Type codes (jump 000) : 655552525552255 Quote rule 0
Error in fread(URL, fill = TRUE, verbose = TRUE) : Could not find first good line start after jump point 73 when sampling.
I was able to overcome the problem and fix my file by identifying the exact row where the switch occurred and doing:
x = fread('head -n 164161 ~/Desktop/fire_random_search.csv')
y = fread('tail -n +164162 ~/Desktop/fire_random_search.csv',
col.names = names(x)[-ncol(x)])
z = rbind(x, y, fill = TRUE)
fwrite(z, '~/Desktop/fire_random_search.csv')
I've got a file I made and continued to add to. Unfortunately at one point I switched from writing 15 to 14 columns and kept using the same file.
I expect two things from this file: 1) when I use
fread
on it, it should fail, but with an error that informs about the inconsistent # of columns 2) when I usefill = TRUE
, the read is successful.Unfortunately neither are true:
I was able to overcome the problem and fix my file by identifying the exact row where the switch occurred and doing:
Also, this worked as expected in 1.10.4:
The text was updated successfully, but these errors were encountered: