-
Notifications
You must be signed in to change notification settings - Fork 2
Conversation
cc @sanmai-NL |
So can you describe a motivating use case or example where this is needed? |
I think @sanmai-NL is a better person to ask than me, but apparently this can happen frequently with Exchange servers. For some motivation, mattnenterprise/rust-imap#11 has several users wanting this, namely @chshawkn-pub, @vandenoever, and @dario23. @sanmai-NL has also been a strong proponent of not requiring server UTF-8 support (e.g., outlook.com doesn't support it: mattnenterprise/rust-imap#11 (comment)). |
Also, as a side-note, it'd be nice to run |
@jonhoo so I'm very opinionated on code formatting, and I think |
Perhaps this was a misunderstanding. The old IMAP Response parser in If not, then we may want to trace those original issues. Reading through various RFCs, however, I think that UTF-8 may not appear anywhere outside of message bodies when the capability to handle UTF-8 isn't advertised. I conclude that from the applicable RFC 6855 (note that an obsolete version of it was discussed earlier). Moreover, I cannot say UTF-8-invalid byte sequences may appear at all. I've tested creating folders with UTF-8 characters, messages with UTF-8 envelope senders, subjects and bodies on an Exchange as well as Dovecot IMAP server using Mozilla Thunderbird, but in all cases these strings are encoded in an ASCII compatible way. The only potential issue I see is that NULL bytes aren't allowed and such. The strings that appear in IMAP protocol messages are rather a restricted subset of ASCII. We should test the |
This fixes djc#5 as proposed in that issue. Specifically, it introduces a trait, `FromByteResponse`, and provides a generic `parse_response` function that parses input as raw byte strings, and then maps them using some implementation of that trait. This in turns allows users to choose how they want to parse byte sequences in an ergonomic way. Under the hood, this is *slightly* less efficient that it could be. Specifically, it parses everything as `&[u8]` first, and then maps everything using calls to `FromByteResponse`. This works fine, but will cause unnecessary re-allocation of vectors inside a bunch of structs. The way to work around this is to have `nom` directly use a generic return value in all its parsers, but this unfortunately doesn't seem to be supported at the time of writing.
Interesting... Well, if we never need to worry about
Those are somewhat orthogonal though, and may not be worth the extra code. |
Yeah, so based on all this I don't think this is the right approach at this time. It feels to complex, even more so given that we don't really have a solid use case for this yet. Jon, sorry you spent time on something that won't be merged in the short term! I hope you did get something out of it. As for the |
@djc: Rushing to a solution would be a bit premature. Not to say this was definitely rushing, just that I can understand how you would perceive it @djc. I hope to help @jonhoo in getting to the bottom of the UTF-8 decoding issue that was reported. Closing the PR may be also a bit premature in that sense, perhaps my latest analysis that there's shouldn't be UTF-8 invalid byte sequences anywhere outside of e-mails is wrong again. |
Reopening the PR will be cheap, and I am more than happy to do so if it turns out I'm totally wrong about all this! |
Given that I personally have no use-case for non-UTF8 servers, this all seems fine by me. I mostly just did this because I want to close mattnenterprise/rust-imap#54 :p |
This fixes #5 as proposed in that issue. Specifically, it introduces a trait,
FromByteResponse
, and provides a genericparse_response
function that parses input as raw byte strings, and then maps them using some implementation of that trait. This in turns allows users to choose how they want to parse byte sequences in an ergonomic way.Under the hood, this is slightly less efficient that it could be. Specifically, it parses everything as
&[u8]
first, and then maps everything using calls toFromByteResponse
. This works fine, but will cause unnecessary re-allocation of vectors inside a bunch of structs. The way to work around this is to havenom
directly use a generic return value in all its parsers, but this unfortunately doesn't seem to be supported at the time of writing.