-
-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust-encoding support #13
Comments
Text-oriented stream is indeed a big concern of rust-encoding (lifthrasiir/rust-encoding#20, rust-lang/rfcs#57). I'm still figuring out how to do that (for months :S), but I basically agree to your points. |
With the changes to |
@ehiggs I don't understand what you mean? What is |
I had some code that I was struggling with. I asked on IRC and was told the library might be not up to date with regard to rustc-encodable:
To be honest I didn't understand the situation myself but was aware that there were changes in Rust 1.0 alpha that affected the encoding. I have since fixed my code so it's working. Sorry for the confusion! |
I was the mystery IRC person here. |
I think it's probably beyond the scope of this crate to explicitly support encodings other than ASCII compatible encodings. It's more likely that, as an ecosystem, we'll want generic adapters that allow us to plug in transcoders anywhere |
It might be a good idea to think about directly supporting rust-encoding, which is a great library that converts between Unicode strings and raw bytes for a variety of different encodings.
Currently, the CSV parser operates at the byte level, which means that it assumes the source text is ascii compatible. That is, an ASCII byte always corresponds to a character in correspondence with the ASCII character. This is OK for regular ASCII text, UTF-8 and latin-1 I think. But for other encodings, this will fail (and it will probably fail silently by providing an incorrect parse). For this reason, decoding text can't be done after the CSV parser has its way, because it could corrupt the text.
I think the obvious implementation path is to implement
std::io::Reader
for all types that satisfyencoding::Encoding
. Then it can be trivially used with the CSV parser because the raw bytes will be UTF-8 encoded. Come to think of it, @lifthrasiir, does an impl forReader
(andWriter
) sound like something that belongs inencoding
proper? One possible downside of this approach is that the caller will pay for checking that the string is Unicode when they callrecords
.An alternative is to demand that users run their CSV data through
iconv
or something similar.The text was updated successfully, but these errors were encountered: