Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Provide an option to override fread's jumpiness #2157

Closed
franknarf1 opened this issue May 10, 2017 · 1 comment
Closed

[Request] Provide an option to override fread's jumpiness #2157

franknarf1 opened this issue May 10, 2017 · 1 comment
Milestone

Comments

@franknarf1
Copy link
Contributor

Reading this file with fread, I get ...

Error in fread("bah.csv") :
Internal error: Sampling jump point 10 is before the last jump ended

It would be nice to have an option to tell fread to traverse the file naively (or whatever), so I don't have to write a wrapper to catch this error and take it to read.csv.

Tested with...

data.table 1.10.5 IN DEVELOPMENT built 2017-05-09 05:01:26 UTC; travis

R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] tools_3.3.3

@artemklevtsov
Copy link

artemklevtsov commented May 11, 2017

Same here with this file: test2.zip

R> fread("/tmp/test2.csv", verbose = TRUE)
Input contains no \n. Taking this to be a filename to open
NAstrings = [<<NA>>]
None of the NAstrings are numeric (such as '-9999').
`filename` argument given, attempting to open a file with such name
File opened, size 0.000100 GB.
Memory mapping ... ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Positioned on line 1 starting: <<X1,X2,X3,X4,X5,X6,X7,X8,X9,X10>>
Detecting sep ...
  sep==','(ascii 44)  with 100 lines of 11 fields using quote rule 0
Detected 11 columns on line 1. This line is either column names or first data row (first 30 chars): <<X1,X2,X3,X4,X5,X6,X7,X8,X9,X10>>
All the fields on line 1 are character fields. Treating as the column names.
Number of sampling jump points = 11 because 107192 bytes from row 1 to eof / (2 * 3065 jump0size) == 17
Type codes (jump 000)    : 62222222222  Quote rule 0
Type codes (jump 001)    : 65555555555  Quote rule 0
Ошибка в fread("/tmp/test2.csv", verbose = TRUE) :
  Internal error: Sampling jump point 10 is before the last jump ended
R> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=ru_RU.UTF-8       LC_NUMERIC=C               LC_TIME=ru_RU.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=ru_RU.UTF-8    LC_MESSAGES=ru_RU.UTF-8    LC_PAPER=ru_RU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0    parallel_3.4.0 yaml_2.1.14   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants