-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move HTTP base Parser to namespace Http #894
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct a design flaw placing the HTTP parsing logics all under HTTP/1 specific namespace
The existing Http::One::Parser class hierarchy has serious design flaws, but I do not see why its location in Http::One is wrong. I wonder whether those other flaws has led to the above (incorrect AFAICT) conclusion.
To make progress, I suggest that we try to establish what the class currently known as Http::One::Parser
should be and then come back to the question of where that class should be located. I can bootstrap that discussion with a proposal echoing the current Http::One::Parser documentation and API:
- Http::One::Parser should be a class responsible for parsing an HTTP/1 message prefix (i.e. start-line followed by field-lines).
I would not be violently against a more broad definition (in view of theoretically possible future code changes):
- Http::One::Parser should be a class responsible for parsing an HTTP/1 message (i.e. start-line followed by field-lines and message-body).
The above are rough definitions that omit some details and need polishing, but I think they are usable in this discussion. Both imply that the class is in the right namespace.
Clearly, this PR implies that Http::One::Parser should be something else instead. What should it be, in your opinion?
FWIW, I could not extract the answer from the PR changes because the proposed HTTP protocol parser
definition is too broad to describe any class -- there are just too many very different parsing needs related to HTTP (e.g., reason-phrase parsing, Cache-Control header field parsing, HPACK parsing, chunked extension parsing, etc., etc.). One class simply cannot meaningfully/usefully represent all those needs, no matter how abstract that class is, and the existing Http::One::Parser overall API does not really try to do that.
P.S. There are some other problems with the PR, but I think we should focus on this key issue first before diving into details.
When this Parser class was designed and implemented there was an assumption that HTTP/2+ being binary would need an entirely different set of Parser internals, thus a different Parser class. In practice that turned out to be wrong and HTTP/2 needs to share these pieces of the basic parsing for version selection and field value parsing. PR #893 has been published as example. |
"Pieces of basic HTTP parsing" reusable across HTTP/1 and HTTP/2 message parsers do not belong to a single C++ class. For example, they do not belong to the Http::One::Parser class (as defined in my review and for any other reasonable definition I can imagine). This misunderstanding is exactly why I suggested that we define what Http::One::Parser class is (or should be) before we discuss whether that class should be moved. |
HTTP/1 and HTTP/2 syntax and use of TCP are still highly intertwined. HTTP/2 only redefines the framing structure and transport encoding. It does not change the HTTP header parsing and validation needs. HTTP/1 parser needs to handle sudden arrival of the HTTP/2 magic prefix instead of any HTTP/1 request message. You can see this by how PR #893 does not change the Http::Parser API methods or how they are used by most of the Squid code - just slight changes their results depending on protocol version parsed. |
Regardless of the accuracy of that assertion (which I hope we do not need to perfect and debate here), my concerns are not directly related to its validity. Even if we assume, for the sake of the argument, that "HTTP/1 and HTTP/2 syntax and use of TCP are highly intertwined", this PR is moving things in the wrong direction (due to largely unrelated concerns detailed earlier, along with a specific suggestion on how to resolve them).
I am not sure it needs to do that, but even if it does, it should use HTTP/2 parsing code for that (and vice versa). And, again, this assertion is largely unrelated to the suggested move of the C++ class in question AFAICT: The best place for the HTTP/1 message (prefix) parser class does not depend on whether that parser needs to know about other protocol magic prefixes.
I do not think that PR 893 is a good example or uses case, but I suggest to (continue to) discuss problems with that PR in that PR. |
Since you keep referring to your initial comment as your reasoning I shall address each point individually...
It is wrong location because Http::One and src/http/one are the wrong places to be defining code that parses HTTP/2 and later versions messages. That location and namespace should be restricted to the logic parsing syntax defined only for HTTP/1 messages (eg chunked encoding, request-line, response-line, HTTP/1 message framing).
The only information I have on these alluded "design flaws" is that a) you have a history of saying the same thing about all features not designed by yourself, and b) the Parser we have now has been highly modified from original design model to meet your imposed design changes to make in "perfect". So "what flaws?"
No. There are 3 distinct message semantics and framing in HTTP/1. They are parsed by Http1::RequestParser, Http1::ResponseParser, and Http1::TeChunkedParser. The class being touched by this PR provides the logic for handling the generic HTTP semantic sections "first-line", magic version detection, "mime-header", commonly used delimiters, etc. Those parts are shared with HTTP/2+ versions even while those versions have distinctly different Parser (eg Http2::FrameParser) handling how the generic HTTP message parts are framed. None of the version-specific parsing is changed by this PR.
As mentioned, HTTP/1 has three syntax: A) request := first-line CRLF mime-header CRLF payload (A) is Http1::RequestParser, (B) is Http1::ResponseParser, and (C) is Http1::TeChunkedParser. HTTP/2 has one syntax: header-frame contains two HPACK encoded blobs parsed by mime-header parser. The first of those blobs provide the semantic details for first-line and status-line. The second provided mime-header. data-frame contains raw octets. Which is the generic HTTP "payload". Thus you can see HTTP has generic semantic structures which are presented by both versions. The API to fetch these semantic details are what Http::Parser provides. The inherited child classes perform version-specific parsing to fetch those details.
The primary "detail" I see being ignored to reach that implication is the existence of HTTP/2+ versions. FYI, HTTP is being updated to separate HTTP/1.1 messaging[1] behaviour, HTTP/2 messaging[2] and HTTP semantics[3] syntax. Both [1] and [2] depend on shared details defined by [3]. [1] https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-messaging-19.txt (Http1::RequestParser and Http1::ResponseParser) This PR acknowledges those distinctions by moving the generic logic and API of Http1::Parser to Http::Parser.
We should not have a Http::One::Parser, or at most it should be the logic shared by the various HTTP/1 messages and not needed by other HTTP versions. I have not found any such HTTP/1 specific thing though - even the CRLF delimiter is needed by HTTP/2 parser at times. The Http::Parser moved by this PR is implemented as a parent class such that Squid ports I/O can be handled by either Http1::RequestParser, Http1::ResponseParser, Http1::TeChunkedParser, Http2::FrameParser or Http3::FrameParser.
I disagree. As demonstrated by the HTTP WG production of [3] above "HTTP protocol" definition has semantic structures which are completely agnostic of version. Those parts can absolutely be represented by an Http::Parser class, most of them already are. With C++ inheritence we can share that code into version-specific or message-specific Parser classes. That is how/what "good OO design" looks like.
Strawman argument there. Nobody is attempting to implement a single class that does everything. Notice how all the parsers for the features you mention are left untouched by both this PR and the associated one this supports. Even the Http1::Request/ResponseParser only change in minor ways from the symbol renaming.
Indeed. I don't know why you keep bringing up or insisting on designs using that model of "single class which does everything". This PR is simply moving the logic needed by both HTTP/1 and HTTP/2 to be outside the HTTP/1-specific namespace. Leaving the version-specific logic in their own HttpN::FooParser classes, untouched. |
TLDR: // Amos proposal (as far as I can tell; with an empty One::Parser I added)
namespace Http {
/// The API to fetch HTTP semantic details.
class Parser {...};
namespace One {
/// The API to fetch HTTP/1 semantic details.
using Parser = ::Http::Parser;
/// HTTP/1 request prefix (i.e. request-line plus headers) parser.
class RequestParser: public Parser {...};
/// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
class ResponseParser: public Parser {...};
/// HTTP/1 chunked-encoding parser
class TeChunkedParser: public Parser {...};
}
} Please adjust the above sketch to accurately represent what you are proposing, but keep that ("empty" if you wish) Http::One::Parser there to help us make progress and compare apples to apples. See further below for details. // Alex view of the existing code after bug fixes and One::Parser renaming
namespace Http {
namespace One {
/// HTTP/1 message prefix (i.e. first-line plus headers) parser.
class MessageParser {...};
/// HTTP/1 request prefix (i.e. request-line plus headers) parser.
class RequestParser: public MessageParser {...};
/// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
class ResponseParser: public MessageParser {...};
/// HTTP/1 chunked-encoding parser
class TeChunkedParser {...}; // not a MessageParser child!
}
} I doubt HTTP/2 will follow exactly the same pattern due to messages encapsulated inside frames, but, if it does, we will eventually have something like this: // Alex view of a possible distant future
namespace Http {
/// HTTP message prefix (i.e. stuff before the body) parser.
class MessageParser {...};
namespace One {
/// HTTP/1 message prefix (i.e. first-line plus headers) parser.
class MessageParser: public Http::MessageParser {...};
/// HTTP/1 request prefix (i.e. request-line plus headers) parser.
class RequestParser: public MessageParser {...};
/// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
class ResponseParser: public MessageParser {...};
/// HTTP/1 chunked-encoding parser
class TeChunkedParser {...}; // not a MessageParser child!
}
namespace Two {
/// HTTP/2 message prefix (i.e. HEADERS payloads) parser.
class MessageParser: public Http::MessageParser {...};
}
}
You are saying that the Http::One::Parser location is wrong because of Y. I agree with Y, but see no valid connection between the Http::One::Parser location and Y. To me, that implies that your definition of Http::One::Parser differs from mine:
We are less likely to make progress here without a shared understanding of what Http::One::Parser is (and should become if we want to change it). I still think that we should focus on developing that understanding. The next step in that direction, IMO, is for you to provide a definition so that I do not have to discuss something based on my guesses of what you have in mind. The discussion below may help with that, but we should try not to lose that focus. I have also provided design sketches above in hope to provide a common framework for developing those definitions.
I agree that Http::One namespace should be used for HTTP/1-specific things.
A comprehensive list does not really matter right now, but I will mention two flaws that may be relevant:
Sorry, I see no connection between "No" and the sentence following that word. The two classes you mention are correctly derived from Http::One::Parser, so the essence of any Http::One::Parser definition ought to apply to them as well -- they only detail/specialize what their parent class does! The current TeChunkedParser parent is one of the design flaws, so it has no bearing on what the parent class should be.
I disagree that Http::One::Parser is such a class today. The existing code is clearly specific to HTTP/1. That does not mean that some bits and pieces cannot be reused elsewhere, of course, but the class is essentially an HTTP/1 message prefix parser today. I am not sure we will need a "generic HTTP semantic sections parser" at all. If we develop such a need in the future, that class can be created, of course, but this existing-class-moving PR is not about that.
The concern is not the unchanged parsing code internals, but the changed location of their class.
I am not sure what you mean exactly here, but I see HTTP/1-specific code being moved into version-agnostic Http namespace, and that is wrong.
Even if we ignore its oversimplifications and bugs, the above summary does not imply it is a good idea to introduce a single C++ parent class for the above component parsers. The problems with the existing chunked handling code illustrate what happens when the wrong parent class is forced on a parser. And even if such a single parent class will become needed in the future, it is not what Http::One::Parser is today.
HTTP/2+ versions existence has no effect on my definition. It is not an omitted detail; it is an irrelevant one (for my definition)! You are probably thinking about your "The API to fetch HTTP semantic details" definition (if that is what you are proposing as a definition -- I am not sure, please clarify). HTTP/2 would be relevant to that hypothetical API, of course. If so, please note that I am not even trying to define that Http::Parser. I am defining Http::One::Parser and asking you to do the same. It would be easier to make progress starting from something we (hopefully) agree on -- Http::One::RequestParser and Http::One::ResponseParser -- and then moving up, one layer at a time. Even if a layer does not add anything new, it can still be defined, of course.
Both the official code and I disagree that Http::One::Parser represents generic HTTP parsing logic. It does not even represent version-agnostic message sectioning logic well. Thus, that class should not be moved.
I doubt that, but I would be happy if it is gone without duplicating Http::One::RequestParser and Http::One::ResponseParser code. To make progress, you can assume (for the sake of agreeing on one Http::One::Parser definition as a starting point) that Http::One::Parser exists but is essentially empty. See TLDR sketches.
Yes, and not needed by other HTTP/1 components that do not work on the "message" level. For example, a chunk size parser is not a message-level thing and should not have an Http::One::MessageParser as a parent.
There is a big difference between needing X to parse some protocol component and having a parent class member X for parsing that component. In many cases, specific parsers do not need a common parent class because the set of Xs they need differ too much. They are much better off reusing stand-alone functions or low-level classes dedicated to parsing specific things. HTTP/1 chunked encoding and HTTP/2 HPACK parsers are good examples of two HTTP parsers that are unlikely to benefit from a common parent class.
The existing Http::One::Parser (being moved by this PR) does not represent an "HTTP port traffic parser" API. It represents a message (prefix) parsing API. Moving that class will not create the API you envision, of course.
That fact does not mean "HTTP protocol parser" is specific enough to describe a class. Sharing of certain concepts across protocol versions is very natural, but does not imply that there should be a single C++ class that represents all those concepts (and especially a single C++ class that represents a parser of all those shared concepts). However, we do not need to agree on this, I think. When two protocol versions have two parsers that share code we can move that code into a reusable routine or a parent class (whichever works best for that code). If that process results in the creation of an Http::Parser class, then we will have an Http::Parser class.
Good design uses inheritance to implement useful abstractions. Just because two things are described in the same protocol document does not mean they are based on the same abstraction useful in Squid context. Yes, all protocol element parsers are parsers, but that is not a useful abstraction in Squid context (yet?). Good abstractions do not result in "has no meaning" virtual method implementations, for example.
It is not a strawman argument because the "HTTP protocol parser" definition this argument rejects does not exclude any of the parsers I listed. The argument matches that definition! If you want to narrow down the list, then we need to find a more specific definition. And when we are done, I bet it will not match the existing Http::One::Parser class well (because of its flaws), but will be similar to the "HTTP/1 message prefix parser" I have proposed.
I do not. This PR proposes such a model -- "HTTP protocol parser" -- not me.
This PR moves a problematic API (that we cannot even agree on how to define!) and HTTP/1-specific code into the version-agnostic namespace. We should not do any of that, regardless of how "simple" that change looks.
I am ignoring both insults as factually incorrect and irrelevant to this PR. |
Responding to your TLDR pseudo-code examples (only). You came close, however the "headers parser" is a separate "class HttpHeader" in the current code and not touched by this PR. // Alex view of the existing code after bug fixes and One::Parser renaming
namespace Http {
namespace One {
/// HTTP/1 message prefix (i.e. first-line plus headers) parser.
class MessageParser {...};
/// HTTP/1 request prefix (i.e. request-line plus headers) parser.
class RequestParser: public MessageParser {...};
/// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
class ResponseParser: public MessageParser {...};
/// HTTP/1 chunked-encoding parser
class TeChunkedParser {...}; // not a MessageParser child!
}
} This has several issues:
HTTP/2 introduces the term "frame". However it is fully backward compatible to describe the parts of an HTTP/1 and even HTTP/0 message structure as a series of frames in ASCII coding that cannot be interleaved. When I refer to a "frame" of HTTP/1 etc I mean one of those parts, not the whole message. Latest IETF specification which is about to be assigned RFC number is https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19.txt#section-6
HTTP/0 has just two frames (framing-control-data payload), previously described as (request-line CRLF payload). HTTP/1 calls "framing-control-data" a "first-line" and puts the payload framing data into the headers frame Transfer-Encoding header.
// Alex view of a possible distant future
namespace Http {
/// HTTP message prefix (i.e. stuff before the body) parser.
class MessageParser {...};
namespace One {
/// HTTP/1 message prefix (i.e. first-line plus headers) parser.
class MessageParser: public Http::MessageParser {...};
/// HTTP/1 request prefix (i.e. request-line plus headers) parser.
class RequestParser: public MessageParser {...};
/// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
class ResponseParser: public MessageParser {...};
/// HTTP/1 chunked-encoding parser
class TeChunkedParser {...}; // not a MessageParser child!
}
namespace Two {
/// HTTP/2 message prefix (i.e. HEADERS payloads) parser.
class MessageParser: public Http::MessageParser {...};
}
} The above has all the same mistakes from your view of current code. Worse, it compounds the "message" naming to conflate "message" with a single HTTP/2 "frame". We have previously had a discussion about how "byte" could be as small as 7 bits or as large as 128 depending on CPU designs - such that "char == 1 byte" is trouble and "octet" more accurate descriptor term. Similarly an HTTP "message" is semantically invariant but the I/O representation of message varies between versions from ASCII to binary and also can be spread out across packets or entire UDP connections - thus very much trouble. "frame" (like "octet") is a more precise term that describes "a sequence of adjacent octets" with shared semantic ordering in the protocol message. Amos view of a possible not-so-distant future (HTTP/2 parts already implemented in PR #893, HTTP/3 parts undergoing testing in experimental branch not yet submitted) namespace Http {
/// The API to fetch HTTP semantic details.
class Parser {...};
/// HTTP header and trailer (i.e. mime) parser.
class HeaderParser {...};
namespace One {
/// The API to fetch HTTP semantic details.
using Parser = ::Http::Parser;
/// The API to fetch MIME/1.0 semantic details.
using HeaderParser = ::Http::HeaderParser;
/// HTTP/1 request prefix (i.e. request-line plus headers) parser.
class RequestParser: public Parser {...};
/// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
class ResponseParser: public Parser {...};
/// HTTP/1 payload encoding parser (i.e. chunk-size plus headers plus payload plus trailer)
class TeChunkedParser: public Parser, public HeaderParser {...};
}
namespace Two {
/// The API to fetch HTTP semantic details.
using Parser = ::Http::Parser;
/// The API to fetch MIME/1.0 semantic details.
using HeaderParser = ::Http::HeaderParser;
/// HTTP/2 frame (i.e. request-line/status-line/pseudo-headers, control-message, mime-header, payload, trailer) parser.
class FrameParser: public Parser, public HeaderParser {...};
/// HPACK decoder/parser.
class HpackDecoder {...};
}
namespace Three {
/// The API to fetch HTTP semantic details.
using Parser = ::Http::Parser;
/// The API to fetch MIME/1.0 semantic details.
using HeaderParser = ::Http::HeaderParser;
/// HTTP/3 frame (ie QUIC payload) parser.
class FrameParser: public Parser, public HeaderParser {...};
// work is incomplete, maybe parsers for HTTP payload encodings etc needed.
}
} |
I think this is just a minor terminology misunderstanding. Today, Http::One::Parser isolates the message prefix (the start line plus the headers), parses the start line, and also does some preprocessing of the isolated message headers (e.g., unfoldMime()). There is also getHostHeaderField() that actually parses the headers, but we can ignore that oddity if you wish. I call those actions "parsing", but there are probably other words that can describe them. Tomorrow, these classes will do even more to extract information from the isolated message headers (probably using other helper classes), so "parsing" would be even more appropriate. Whatever we end up calling those actions does not really change anything as far as this PR change request is concerned. The PR problem this change request is pointing out is not about naming things. It is about placing things.
This is a similar minor and irrelevant naming disagreement. I see no problem with calling a message parser a MessageParser, even if that parser parses that message incrementally rather than "a whole message at a time". These parses do not do much with message bodies, so we could name them MessagePrefixParser to be a bit more precise. I was following the current Squid naming tradition of ignoring the body aspect when naming message (prefix) parsers. Again, this naming disagreement is not relevant to the problem this change request is discussing.
I believe I have fully covered this naming misunderstanding at the beginning of this response.
I did not notice any significant disagreements or conflicts in the above section of your response compared to what I have said. The However, it may be important to point out explicitly that the "message abstraction" draft-ietf-httpbis-semantics-19.txt is talking about is represented by the Http::Message class (or Http::MessagePrefix or whatever name you prefer -- we are not discussing naming right now), not the proposed Http::Parser class (or Http::HeaderParser or whatever name you prefer)! // Alex view of a possible distant future
namespace Http {
...
namespace Two {
/// HTTP/2 message prefix (i.e. HEADERS payloads) parser.
class MessageParser: public Http::MessageParser {...};
}
}
Just to correct the misunderstanding in the second sentence: The API sketches I posted are not about parsing HTTP/2 frames. As far as HTTP/2 is concerned, the above Two::MessageParser sketch is about what parsing HEADERS frames payload. That parser receives HEADERS payload and extracts/creates the corresponding Http::Two::Message object (or equivalent). We will also parse the frames, of course, but that is a different story. I hope I have addressed the other "mistakes" earlier in this response.
I do not know why you are saying these well-known facts. You can safely assume that I know what HTTP message is, and how it is transferred. If something appears to suggest the opposite, there is probably another (correct) way to interpret what I said.
I (still) do not know what "The API to fetch HTTP semantic details" is so I am ignoring the Http::Parser class for now. Hopefully, that problem will be resolved once we resolve the HeaderParser problem. HTTP/1 header syntax differs from HTTP/2 header syntax. Thus, we will need a distinct header parsing class for each protocol version (and not We may be making progress though: The above C++ sketch is a good illustration for some of the key problems this change request is about! And I hope my response will remove the naming issues out of the way (for now). |
Waiting on PR #1908 |
Correct a design flaw placing the HTTP parsing logics all under HTTP/1
specific namespace. Syntax details are shared with future HTTP
versions and will be needed by their parsers when added.