Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

.tsv file with "Illegal quoting" #848

Open
julianladisch opened this issue Dec 23, 2016 · 6 comments
Open

.tsv file with "Illegal quoting" #848

julianladisch opened this issue Dec 23, 2016 · 6 comments

Comments

@julianladisch
Copy link

A .tsv file that contains quotes triggers this error message:
"We can make this file beautiful and searchable if this error is
corrected: Illegal quoting in line 2."
Example: https://github.com/julianladisch/tsv/blob/master/example.tsv
github-tsv

However, quotes in .tsv files do not have any special meaning so there cannot be any illegal quoting.
See these references for .tsv files:
https://en.wikipedia.org/wiki/Tab-separated_values
https://www.iana.org/assignments/media-types/text/tab-separated-values

I would be very pleased if you can change the .tsv rendering accordingly.

@julianladisch
Copy link
Author

Reply from GitHub Support:

Thanks for reaching out! We currently use a CSV parser with a different delimiter for TSV rendering, and that does lead to this type of response to double quotes. I'll pass along your suggestion that we respect the (lack of) meaning of double quotes in TSV files to the team.

I can't promise if or when it will be implemented, but your suggestion is definitely in the right hands!

Regards,
Laura
@lecoursen
GitHub Support

@cirosantilli
Copy link
Collaborator

cirosantilli commented Dec 28, 2016

The problem is that CSV / TSV are under-specified / have multiple incompatible implementations.

Quotes could be a way to escape tabs that are part of the content, as in some CSV implementations.

@julianladisch
Copy link
Author

TSV is not under-specified but has a well defined formal BNF definition in the IANA standard: https://www.iana.org/assignments/media-types/text/tab-separated-values
Both Microsoft Excel and Google Sheet use this IANA standard when saving as tab delimited text.

There are implementations that do not follow the IANA standard. However, I could not find a different published formal specification. Can you provide a pointer?

@cirosantilli
Copy link
Collaborator

cirosantilli commented Jan 6, 2017

I did not know about the IANA standard.

By under-specified I meant I expected there to be "several incompatible popular implementations", and in the end what matters are the de-facto implementations.

In particular, I expect some implementations to allow for a method of escaping the tabs, which is not allowed in that standard, but I don't know if that is actually the case. If you find that out, I recommend adding it to the description of this issue.

BTW, I now learned that the CSV RFC does however allow for comma escaping: https://tools.ietf.org/html/rfc4180

julianladisch added a commit to folio-org/cql2pgjson-java that referenced this issue Jan 9, 2017
drop tsv support because it is broken on GitHub:
isaacs/github#848
@joewiz
Copy link

joewiz commented Mar 14, 2019

I have stumbled upon this error in a gist of mine, https://gist.github.com/joewiz/194e86f6f7d4e64ca21145cff630eaee. I'd be grateful if the TSV parsing could be updated to handle quotes, which are valid in TSV cells. (The IETF link above is about CSV, whereas the IANA link about TSV says nothing about quotes needing special handling.)

@eric-whil
Copy link

I also ran into this issue with a Google-Sheets-exported TSV file. The response above helps us understand the current situation:

We currently use a CSV parser with a different delimiter for TSV rendering

Using a CSV parser to read TSV files is the problem. Please provide a TSV parser that accepts quotes and double-quotes in the fields.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants