-
-
Notifications
You must be signed in to change notification settings - Fork 238
Description
It might be a good idea to think about directly supporting rust-encoding, which is a great library that converts between Unicode strings and raw bytes for a variety of different encodings.
Currently, the CSV parser operates at the byte level, which means that it assumes the source text is ascii compatible. That is, an ASCII byte always corresponds to a character in correspondence with the ASCII character. This is OK for regular ASCII text, UTF-8 and latin-1 I think. But for other encodings, this will fail (and it will probably fail silently by providing an incorrect parse). For this reason, decoding text can't be done after the CSV parser has its way, because it could corrupt the text.
I think the obvious implementation path is to implement std::io::Reader for all types that satisfy encoding::Encoding. Then it can be trivially used with the CSV parser because the raw bytes will be UTF-8 encoded. Come to think of it, @lifthrasiir, does an impl for Reader (and Writer) sound like something that belongs in encoding proper? One possible downside of this approach is that the caller will pay for checking that the string is Unicode when they call records.
An alternative is to demand that users run their CSV data through iconv or something similar.