You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
loading a file containing NUL ASCII character (in bytes as.raw(0)), except that I have a minimal reproducible example which appears to cause a segfault on line fread.R@146
This example is based on simulation software output where very rarely there can be NUL characters in the body of the file (issue #2485 has already resolved NUL characters at the end of a file). It appears NUL characters at the beginning of a file are acceptable as well.
The header field is key=value pairs, and the data field is to be read into a data.table. In the example, NUL characters has been inserted into the body, you only need one to cause an error which cannot be caught with error handling.
$ Rscript test1.R
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /software/statistical/R-3.5.2/lib64/R/lib/libRblas.so
LAPACK: /software/statistical/R-3.5.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_NZ.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_NZ.UTF-8 LC_COLLATE=en_NZ.UTF-8
[5] LC_MONETARY=en_NZ.UTF-8 LC_MESSAGES=en_NZ.UTF-8
[7] LC_PAPER=en_NZ.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.1 RLinuxModules_0.3
loaded via a namespace (and not attached):
[1] compiler_3.5.2 R.methodsS3_1.7.1 R.utils_2.7.0 R.oo_1.22.0
omp_get_max_threads() = 16
omp_get_thread_limit() = 2147483647
DTthreads = 0
RestoreAfterFork = true
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 16 threads (omp_get_max_threads()=16, nth=16)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
skip num lines = 1
show progress = 0
0/1 column will be read as integer
[02] Opening the file
Opening file test.txt
File opened, size = 29 bytes.
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\nin one file). This is co
[05] Skipping initial rows if needed
Skipped to line 2 in the file Positioned on line 2 starting: <<A B C>>[06] Detect separator, quoting rule, and ncolumns Detecting sep automatically ... sep=' ' with 2 lines of 3 fields using quote rule 0 Detected 3 columns on line 2. This line is either column names or first data row. Line starts as: <<A B C>> Quote rule picked = 0 fill=false and the most number of columns found is 3[07] Detect column types, good nrow estimate and whether first row is column names 'header' changed by user from 'auto' to true Number of sampling jump points = 1 because (24 bytes from row 1 to eof) / (2 * 16 jump0size) == 0 A line with too-few fields (1/3) was found on line 2 of sample jump 0. Type codes (jump 000) : 555 Quote rule 0 All rows were sampled since file is small so we know nrow=1 exactly[08] Assign column names[09] Apply user overrides on column types After 0 type and 0 drop user overrides : 555[10] Allocate memory for the datatable Allocating 3 column slots (3 - 0 dropped) with 1 rows[11] Read the data jumps=[0..1), chunk_size=1048576, total_size=16## suspending with control-z## Remove unwanted suspended jobs$ jobs -l | cut -d' ' -f2 | xargs -I{} kill -9 {}
A different error can be achieved by inserting NUL characters at the beginning of the data field (after the header field) in test2.R;
## bash here doc...
cat - > test2.R << 'EOF'library(data.table)## example #2n <- 1bytes <- c(charToRaw("a=b\n"), rep(as.raw(0), n), charToRaw("A B C\n1 2 3\n4 5 6\n"))writeBin(bytes, "test.txt")## freadtry(fread("test.txt", skip=1, header=TRUE, verbose=FALSE))EOF
Running test2.R;
$ Rscript test2.R
Empty data.table (0 rows and 1 cols): V1
Warning message:
In fread("test.txt", skip = 1, header = TRUE, verbose = FALSE) :
Stopped early on line 3. Expected 1 fields but found 1. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<>>
The data.table above loads with a warning, however it is not the correct size (0 rows and 1 cols).
If you change the inserted byte to anything other than 0 [1 - 255], fread works fine (by including the byte in one of the data.table elements).
The text was updated successfully, but these errors were encountered:
This issue is similar to several previous issues;
loading a file containing
NUL
ASCII character (in bytesas.raw(0)
), except that I have a minimal reproducible example which appears to cause a segfault on line fread.R@146This example is based on simulation software output where very rarely there can be
NUL
characters in the body of the file (issue #2485 has already resolvedNUL
characters at the end of a file). It appearsNUL
characters at the beginning of a file are acceptable as well.The header field is key=value pairs, and the data field is to be read into a data.table. In the example,
NUL
characters has been inserted into the body, you only need one to cause an error which cannot be caught with error handling.Verbose tracelog is provided using a file
test1.R
;Running
test1.R
;A different error can be achieved by inserting
NUL
characters at the beginning of the data field (after the header field) intest2.R
;Running
test2.R
;The data.table above loads with a warning, however it is not the correct size (0 rows and 1 cols).
If you change the inserted byte to anything other than 0 [1 - 255],
fread
works fine (by including the byte in one of the data.table elements).The text was updated successfully, but these errors were encountered: