-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Irreversible empty string handling by fread() and fwrite() #2214
Comments
I believe you simply want to set the `na` argument to `fwrite` to, e.g.,
`NA`.
…On Jun 22, 2017 7:14 AM, "Ethan Welty" ***@***.***> wrote:
First off, thank you for this fantastic package. It effortlessly powers
many of my data caving adventures.
Sometimes, it's necessary to distinguish between null (NA) and empty ("")
strings, and I'm trying to establish a pipeline that preserves this
distinction with minimal markup. This doesn't currently work. To work,
fread() would need to distinguish between and "", and fwrite() ideally
would quote empty strings when quote = "auto".
Consider this data.table:
dt <- data.table::data.table(chr = c(NA, "", "a"), num = c(NA, NA, 1))
Here is the fwrite output with quote="auto":
csv <- paste(
capture.output(
data.table::fwrite(dt, quote = "auto")
),
collapse = "\n"
))
cat(csv)
chr,num
,
,
a,1
The empty string is not quoted, and thus indistinguishable from the null
string. They are both read back in as empty strings:
data.table::fread(csv)
chr num
1: NA
2: NA
3: a 1
If instead we force quotes, the distinction is kept between the null and
empty strings:
csv_quoted <- paste(
capture.output(
data.table::fwrite(dt, quote = TRUE)
),
collapse = "\n"
)
cat(csv_quoted)
"chr","num"
,
"",
"a",1
However, there is no way to read them back in as such. Either they are
both empty:
data.table::fread(csv_quoted)
chr num
1: NA
2: NA
3: a 1
Or both null:
data.table::fread(csv_quoted, na.strings = "")
chr num
1: NA NA
2: NA NA
3: a 1
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2214>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHQQdUIgpHW4ViIgENl5HTR9vhKNqThvks5sGnaqgaJpZM4OCWza>
.
|
@MichaelChirico Sorry, I should have added that:
For context, this is in an attempt to implement the Tabular Data Package specification, which requires a distinction to be made between empty and null strings. This is easily done in JSON with So can I read this file:
in as:
and back out again without data loss? At least not currently, although Is this a current limitation or somehow anti-csv and thus by-design? |
So assuming that the correct representation of the file is
-- there are 2 issues here: one with In order for In order for Since we need a separate Issue per each feature request, I'm splitting this into two. |
@st-pasha That's an excellent overview of the issue(s), thank you. My uninformed opinion would favor only force-quoting empty strings with |
First off, thank you for this fantastic package. It effortlessly powers many of my data caving adventures.
Sometimes, it's necessary to distinguish between null (
and
NA
) and empty (""
) strings, and I'm trying to establish a pipeline that preserves this distinction with minimal markup. This doesn't currently work. To work,fread()
would need to distinguish between""
, andfwrite()
ideally would quote empty strings whenquote = "auto"
.Consider this data.table:
Here is the fwrite output with
quote="auto"
:The empty string is not quoted, and thus indistinguishable from the null string. They are both read back in as empty strings:
If instead we force quotes, the distinction is kept between the null and empty strings:
However, there is no way to read them back in as such. Either they are both empty:
Or both null:
The text was updated successfully, but these errors were encountered: