You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This FR was suggested by @HughParsonage in #2524. Reposting it here so that it doesn't get lost.
Currently fill can be either TRUE or FALSE (default). When true, all incomplete lines in the input will be padded with NAs. If false, any incomplete line in the input will cause a warning to be shown (previously it was an error).
Sometimes, however, users might want to fill with something else than NA. We could consider the following:
fill=c(class1=value1, class2=value2, ...): fill columns of type class1 with value1, columns of type class2 with value2, ..., all other columns are still filled with NAs.
fill=value: same as fill=c(class(value) = value).
fill=c(col1=value1, col2=value2, ...): fill column named col1 with values value1, column named col2 with values value2, ..., and all other columns with NAs.
fill=c(value1, value2, ...): fill first column with value1, second column with value2, etc.
fill=c(col1=c(value1=repl1, value2=repl2, ...), ...): in column col1 replace value1 with repl1, value2 with repl2, etc. This variant merges na.strings with fill.
We might also want to consider a different parameter name here. Right now fill controls the behavior of fread when the rows are ragged (i.e. different number of values in each row). It seems like a more natural extension for this functionality (but not the name) is to allow more choices what to do when rows are ragged: fill-with-NAs, fill-with-NAs-and-warn, error, warn-and-stop, etc.
On the other hand, the question of what to replace the missing values with seems to be orthogonal to the treatment of ragged rows. In particular, it is perfectly reasonable to ask for strict behavior (i.e. current fill=TRUE) but to fill all NAs in integer columns with say -999.
The text was updated successfully, but these errors were encountered:
#5119 extended fill= so that fill=integer means "I know there are at most integer columns in the table". I haven't read the OP carefully enough to say for sure, but at a high level it looks like setnafill() works well to achieve what it's after. @ben-schwen could you PTAL and rule on whether this can be closed as out-of-scope? Or is there more functionality worth exploring here?
I think the only corner case where this might give more functionality over fread and setnafill after reading is if I want to distinguish between missing and another na.string.
As Michael mentioned this interferes with fill=integer providing a user-guess which was an often requested feature and implemented in #5119
I would keep it as an FR but with a different parameter name, which might be put into ... and then get evaluated. Definitely only worth implementing if requested by more users.
This FR was suggested by @HughParsonage in #2524. Reposting it here so that it doesn't get lost.
Currently
fill
can be eitherTRUE
orFALSE
(default). When true, all incomplete lines in the input will be padded with NAs. If false, any incomplete line in the input will cause a warning to be shown (previously it was an error).Sometimes, however, users might want to fill with something else than NA. We could consider the following:
fill=c(class1=value1, class2=value2, ...)
: fill columns of typeclass1
withvalue1
, columns of typeclass2
withvalue2
, ..., all other columns are still filled with NAs.fill=value
: same asfill=c(class(value) = value)
.fill=c(col1=value1, col2=value2, ...)
: fill column namedcol1
with valuesvalue1
, column namedcol2
with valuesvalue2
, ..., and all other columns with NAs.fill=c(value1, value2, ...)
: fill first column withvalue1
, second column withvalue2
, etc.fill=c(col1=c(value1=repl1, value2=repl2, ...), ...)
: in columncol1
replacevalue1
withrepl1
,value2
withrepl2
, etc. This variant mergesna.strings
withfill
.We might also want to consider a different parameter name here. Right now
fill
controls the behavior of fread when the rows are ragged (i.e. different number of values in each row). It seems like a more natural extension for this functionality (but not the name) is to allow more choices what to do when rows are ragged: fill-with-NAs, fill-with-NAs-and-warn, error, warn-and-stop, etc.On the other hand, the question of what to replace the missing values with seems to be orthogonal to the treatment of ragged rows. In particular, it is perfectly reasonable to ask for strict behavior (i.e. current
fill=TRUE
) but to fill all NAs in integer columns with say-999
.The text was updated successfully, but these errors were encountered: