Skip to content
This repository has been archived by the owner on Jul 22, 2019. It is now read-only.

Parse FETCH ... BODYSTRUCTURE response #15

Open
sanmai-NL opened this issue Nov 25, 2017 · 6 comments
Open

Parse FETCH ... BODYSTRUCTURE response #15

sanmai-NL opened this issue Nov 25, 2017 · 6 comments

Comments

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Nov 25, 2017

See RFC 3501, 6.4.5, BODYSTRUCTURE. Also, MIME-IMB/MIME document series in RFC 2045.

For a motivation, see RFC 2683 3.2.1.4.

It would be great if some mentoring is provided over this (or ideally documentation).

@djc
Copy link
Owner

djc commented Nov 25, 2017

I'm happy to provide some mentoring. The best place to start is probably to just look at 5b9dc9b and copy that approach for the needs of BODYSTRUCTURE; then ask me questions if anything is unclear.

I could maybe write some documentation, but as I already stated in djc/tokio-imap#2 it would help if you could state more concretely what you're looking for. In my mind the parser code in imap-proto is just applying nom macros to the formal syntax from the RFCs ideally as directly as possible, which is pretty straightforward.

@sanmai-NL
Copy link
Contributor Author

I’m on it. 🙂

An remotely related question to cement my understanding. In RFC 3501 FLAGS (store command data item) takes a list, unlike FLAGS (fetch item), looking at the fetch-att production. This seems to be incongruous with the parser code in src/parser.rs#L452-L456, that seems to imply a list follows when "FLAGS" is used as fetch item. Could you explain?

@djc
Copy link
Owner

djc commented Dec 9, 2017

Not sure I fully understand your question, but in general there's no parsing code for commands, only for server responses. Does that explain what you are seeing?

@sanmai-NL
Copy link
Contributor Author

Yes, of course. Sorry.

@sanmai-NL
Copy link
Contributor Author

sanmai-NL commented Dec 10, 2017

Your msg_att_list appears to be the equivalent of msg-att, correct?

I’m interested to learn, how did you come to the decision to not follow the ABNF ‘strictly’? Are you, in principle, okay with rewriting your code to match the ABNF strictly, e.g. distinguishing msg-att-dynamic and msg-att-static?

In the msg-att-static production, this alternative

`"BODY" ["STRUCTURE"] SP body /

covers BODYSTRUCTURE responses. This implies that the response to both a BODY and a BODYSTRUCTURE FETCH command can be handled by a single combinator.

Somewhat puzzled, I searched online a bit, and found this analysis. I conclude that some responses are in the IMAP formal language, but are invalid per the protocol description. For example, RFC 3501 7.4.2:

Extension data is never returned with the BODY fetch, but can be returned with a BODYSTRUCTURE fetch.

Regrettably, here the authors are using the language ‘is never’ rather than clearer requirement key words (RFC 8174). Anyway, it seems as if the parser, if implemented based on the grammar, would accept responses that make no sense from an implementation standpoint. Could you weigh in on this?

I propose to modify the existing msg_att_body_section combinator into a msg_att_body_or_bodystructure.

I’ve opened a WIP PR #16 for you to look at for a context to my comment, @djc. Let’s continue discussion there.

@sanmai-NL sanmai-NL changed the title Recognize Internet Message Body FETCH ... BODYSTRUCTURE response Parse FETCH ... BODYSTRUCTURE response Dec 10, 2017
@djc
Copy link
Owner

djc commented Dec 10, 2017

In writing the parser, there are two concerns that might somewhat compete. One is to make the parser easy to read and follow in code. The other is to make the parser resemble the formal syntax in the RFC, so that it's easy to match entry points to the standard. If you have proposals to rename some parsers to make them easier to match to the parser, I'm probably okay with that. On the other hand, combining or separating parsers just because it's in the formal syntax, I might not be okay with if I feel it makes the parser harder to follow. I'd be fine with adding comments to point out the disparity, though!

For this case, to me the msg-att-static and msg-att-dynamic separation doesn't make sense in the context of the parser, even if it might make sense in the context of understanding the protocol somehow. I feel like separating these would make the parser harder to follow. Feel free to prove me wrong with a PR, but I'm probably going to be disinclined to merge it.

If the parser would accept responses that make no sense from an implementation standpoint, I think my guiding principle would be what implementation is the least complex. The primary goal is for the parser to accept all commonly used IMAP vocabulary. The secondary goal is to minimize complexity.

Modifying msg_att_body_section into msg_att_body_or_bodystructure sounds okay to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants