Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request from GHSA-q36x-r5x4-h4q6
Motivation The HTTP2FrameDecoder is a complex object that was written early in the development of swift-nio-http2. Its logical flow is complex, and it hasn't been meaningfully rewritten in quite some time, so it's difficult to work with and understand. Annoyingly, some bugs have crept in over the years. Because of the structure of the code it can be quite difficult to understand how the parser actually works, and fixing a given issue can be difficult. This patch aims to produce a substantial change to the HTTP2FrameDecoder to make it easier to understand and maintain in the long run. Modifications This patch provides a complete rewrite of HTTP2FrameDecoder. It doesn't do this by having a ground-up rewrite: instead, it's more like a renovation, with the general scaffolding kept. The rewrite was performed incrementally, keeping the existing test suite passing and writing new tests when necessary. The following major changes were made: 1. New states and edges were added to the state machine to handle padding. Prior to this change, padding was handled as part of frame payload decoding. This is not totally unreasonable, but it dispersed padding management widely and made it easy to have bugs slip in. This patch replaces this with a smaller set of locations. Padding is now handled in two distinct ways. For HEADERS and PUSH_PROMISE frames, trailing padding is still stripped as part of frame payload decode, but it's done so generically, and the padding bytes are never exposed to the individual frame parser. For DATA, there is a new state to handle trailing padding removal, which simplifies the highly complex logic around synthesised data frames. For all frames, the leading padding byte is handled by a new dedicated state which is used unconditionally, instead of attempting to opportunistically strip it. This simplifies the code flow. As a side benefit, this change means we can now accurately report the padding used on HEADERS and PUSH_PROMISE frames, even when they are part of a CONTINUATION sequence. 2. The synthesised DATA frame logic has been substantially reworked. With the removal of the padding logic from the state, we now know that so long as we have either got a byte of data to emit _or_ the DATA frame is zero length, we will always emit a frame. This has made it simpler to understand the control flow when synthesising DATA frames. 3. The monolithic state switch has been refactored into per-state methods. This helps manage the amount of state that each method can see, as well as to logically split them up. In addition, it allows us to recast state transformations as (fairly) pure functions. Additionally, this allowed the larger methods to be refactored with smaller helpers that are more obviously correct. 4. The frame payload parsers have been rewritten. The main goal here was to remove preflight length checks and unsafe code. The preflight length checks cause trouble when they disagree with the parsing code, so we now rely on the parsing code being correct with regard to length. Relatedly, we previously had two separate places where we communicated length: a frame header length and a ByteBuffer length. This was unnecessary duplication of information, so we instead use a ByteBuffer slice to manage the length. This ensures that we cannot over-parse a message. Finally, in places that used unsafe code or separate integer reads, we have refactored to stop using that unsafe code and to use combined integer reads. 5. Extraneous serialization code has been extracted. The HTTP2FrameEncoder was unnecessarily in this file, which took a large file and made it larger. I moved this out. Result The resulting parser is clearer and safer. Complex logic has been broken out into smaller methods with less access to global data. The code should be generally clearer.
- Loading branch information