You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the parameter sep in function fread defaults to the set [,\t |;:]
I suggest to include "\n" as final separator in the default, as this might improve downwards-compatibility of existing code with previous versions of data.table.
An example would be a file, where only one single string is written in each line but occassionally some of the sep-default-characters are part of the string.This produces an error in 1.9.5 due to string "c:4" in line 3 (but not in 1.9.4) when not explicitly specifying sep = "\n".
myfile = "/net/ifs1/san_projekte/projekte/genstat/09_nutzer/holger/39_dt_request//ex_150309.txt" # available at https://www.dropbox.com/s/y6cmkcza36c1qjn/ex_150309.txt?dl=0
aa = fread(myfile, verbose = T)
## Input contains no \n. Taking this to be a filename to open
## File opened, filesize is 0.000000 GB.
## Memory mapping ... ok
## Detected eol as \r\n (CRLF) in that order, the Windows standard.
## Positioned on line 1 after skip or autostart
## This line is the autostart and not blank so searching up for the last non-blank ... line 1
## Detecting sep ... ':'
## Detected 2 columns. Longest stretch was from line 3 to line 3
## Starting data input on line 3 (either column names or first row of data). First 10 characters: c:4
## Warning in fread(myfile, verbose = T): Starting data input on line 3 and
## discarded previous non-empty line: b
## Some fields on line 3 are not type character (or are empty). Treating as a data row and using default column names.
## Count of eol: 3 (including 1 at the end)
## Count of sep: 1
## nrow = MIN( nsep [1] / ncol [2] -1, neol [3] - nblank [1] ) = 1
## Error in fread(myfile, verbose = T): Expected sep (':') but new line, EOF (or other non printing character) ends field 0 when detecting types ( first): d
aa = fread(myfile, verbose = T, sep = "\n")
## Input contains no \n. Taking this to be a filename to open
## File opened, filesize is 0.000000 GB.
## Memory mapping ... ok
## Detected eol as \r\n (CRLF) in that order, the Windows standard.
## Positioned on line 1 after skip or autostart
## This line is the autostart and not blank so searching up for the last non-blank ... line 1
## Using supplied sep '
## ' ... Deducing this is a single column input.
## Starting data input on line 1 (either column names or first row of data). First 10 characters: a
## All the fields on line 1 are character fields. Treating as the column names.
## Count of eol: 4 (including 1 at the end)
## Count of sep: 3
## ncol==1 so sep count ignored
## Type codes ( first 5 rows): 4
## Type codes: 4 (after applying colClasses and integer64)
## Type codes: 4 (after applying drop or select (if supplied)
## Allocating 1 column slots (1 - 0 dropped)
## Read 3 rows. Exactly what was estimated and allocated up front
## 0.000s ( 71%) Memory map (rerun may be quicker)
## 0.000s ( 13%) sep and header detection
## 0.000s ( 3%) Count rows (wc -l)
## 0.000s ( 6%) Column type detection (first, middle and last 5 rows)
## 0.000s ( 3%) Allocation of 3x1 result (xMB) in RAM
## 0.000s ( 2%) Reading data
## 0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
## 0.000s ( 0%) Coercing data already read in type bumps (if any)
## 0.000s ( 2%) Changing na.strings to NA
## 0.000s Total
aa
## a
## 1: b
## 2: c:4
## 3: d
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-suse-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.9.5 knitr_1.9
##
## loaded via a namespace (and not attached):
## [1] chron_2.3-45 evaluate_0.5.5 formatR_1.0 stringr_0.6.2
## [5] tools_3.1.2
The text was updated successfully, but these errors were encountered:
This should be fixed in dev now, when 7357a3a is merged. Recent work significantly improves automatic format detection since this issue was raised 2 years ago. I tried the dropbox link but it's no longer working.
Please try dev and reattach the file if it still doesn't work. Thanks!
Hi,
Currently, the parameter sep in function fread defaults to the set [,\t |;:]
I suggest to include "\n" as final separator in the default, as this might improve downwards-compatibility of existing code with previous versions of data.table.
An example would be a file, where only one single string is written in each line but occassionally some of the sep-default-characters are part of the string.This produces an error in 1.9.5 due to string "c:4" in line 3 (but not in 1.9.4) when not explicitly specifying sep = "\n".
Here is an example:
(I am using data.table 1.9.5 devel from 8.3.2015, txt file available at https://www.dropbox.com/s/y6cmkcza36c1qjn/ex_150309.txt?dl=0)
The text was updated successfully, but these errors were encountered: