Skip to content

Conversation

@elsloo
Copy link
Contributor

@elsloo elsloo commented Mar 16, 2023

  • Inner loop's auto spot2 iterator contains a member variable, _block that references the memory address of the response headers, not the cached headers
    • Method called on the cached_headers object uses a memory address, not a named header, and leads to all response headers being deleted from the duplicate header forward
    • Once response headers are deleted, the first duplicate header is written to the cached headers as a new and additional field which leads to duplication in the cached object's headers, then the current iteration ends
    • The iterator of the outer loop then attempts to increment forward one step via the overridden ++ operator on spot, and due to header slot checks via MIMEField::is_live() within MIMEHdrImpl::iterator::step(), _slot is incremented to limit and iteration of the outer loop halts, because all response headers from that point forward were deleted in the prior iteration
  • This change moves away from the address-based approach to delete the header and instead uses the header name
    • Using the header name instead of relying on header alignment also fixes a secondary issue that could arise if response header ordering does not match the cached object's header ordering
    • Using the header name is slightly less efficient due to having to call find_header, however, this is necessary to ensure the cached headers are correctly removed before the response headers are merged into the cached object
    • Using a slightly less efficient approach that occurs only on successful revalidations that also contain duplicate headers should be acceptable given the tradeoff is allowing duplication if response header ordering differs from the cached object
  • Due to the existing logic, if a cached header is not in the response, it will remain in the cached object's headers

Thanks to Masakazu and Leif for providing the unit test and helping with the fix, respectively.

…s upon successful revalidation when duplicate headers are present

- Inner loop's auto `spot2` iterator contains a member variable, `_block` that references the memory address of the response headers, not the cached headers
  - Method called on the `cached_headers` object uses a memory address, not a named header, and leads to all response headers being deleted from the duplicate header forward
  - Once response headers are deleted, the first duplicate header is written to the cached headers as a new and *additional* field which leads to duplication in the cached object's headers, then the current iteration ends
  - The iterator of the outer loop then attempts to increment forward one step via the overridden `++` operator on `spot`, and due to header slot checks via `MIMEField::is_live()` within `MIMEHdrImpl::iterator::step()`, `_slot` is incremented to `limit` and iteration of the outer loop halts, because all response headers from that point forward were deleted in the prior iteration
- This change moves away from the address-based approach to delete the header and instead uses the header name
  - Using the header name instead of relying on header alignment also fixes a secondary issue that could arise if response header ordering does not match the cached object's header ordering
  - Using the header name is slightly less efficient due to having to call `find_header`, however, this is necessary to ensure the cached headers are correctly removed before the response headers are merged into the cached object
  - Using a slightly less efficient approach that occurs only on successful revalidations that also contain duplicate headers should be acceptable given the tradeoff is allowing duplication if response header ordering differs from the cached object
- Due to the existing logic, if a cached header is *not* in the response, it will remain in the cached object's headers

Thanks to Masakazu and Leif for providing the unit test and helping with the fix, respectively.

Co-Authored-By: Masakazu Kitajo <maskit@apache.org>
Co-Authored-By: Leif Hedstrom <zwoop@apache.org>
@elsloo elsloo added the Bug label Mar 16, 2023
@zwoop
Copy link
Contributor

zwoop commented Mar 16, 2023

This is a serious bug introduced with v9.2.0, from #7476. The fix here restores the behavior as it was prior to that PR, using the API that ends up calling find_header(). So performance wise, it will not be any worse than versions before v9.2.0.

@zwoop zwoop added this to the 10.0.0 milestone Mar 16, 2023
elsloo and others added 2 commits March 16, 2023 19:07
@elsloo elsloo merged commit c7ed799 into apache:master Mar 22, 2023
zwoop added a commit that referenced this pull request Mar 31, 2023
…ion when duplicate headers are present (#9527)

* Fixes silent header duplication and early loop termination that occurs upon successful revalidation when duplicate headers are present

- Inner loop's auto `spot2` iterator contains a member variable, `_block` that references the memory address of the response headers, not the cached headers
  - Method called on the `cached_headers` object uses a memory address, not a named header, and leads to all response headers being deleted from the duplicate header forward
  - Once response headers are deleted, the first duplicate header is written to the cached headers as a new and *additional* field which leads to duplication in the cached object's headers, then the current iteration ends
  - The iterator of the outer loop then attempts to increment forward one step via the overridden `++` operator on `spot`, and due to header slot checks via `MIMEField::is_live()` within `MIMEHdrImpl::iterator::step()`, `_slot` is incremented to `limit` and iteration of the outer loop halts, because all response headers from that point forward were deleted in the prior iteration
- This change moves away from the address-based approach to delete the header and instead uses the header name
  - Using the header name instead of relying on header alignment also fixes a secondary issue that could arise if response header ordering does not match the cached object's header ordering
  - Using the header name is slightly less efficient due to having to call `find_header`, however, this is necessary to ensure the cached headers are correctly removed before the response headers are merged into the cached object
  - Using a slightly less efficient approach that occurs only on successful revalidations that also contain duplicate headers should be acceptable given the tradeoff is allowing duplication if response header ordering differs from the cached object
- Due to the existing logic, if a cached header is *not* in the response, it will remain in the cached object's headers

Thanks to Masakazu and Leif for providing the unit test and helping with the fix, respectively.

Co-Authored-By: Masakazu Kitajo <maskit@apache.org>
Co-Authored-By: Leif Hedstrom <zwoop@apache.org>

* Attempt to fix the Debian build failure.

Co-Authored-By: Masakazu Kitajo <maskit@apache.org>

* Added a dependency for the unit test.

---------

Co-authored-by: Masakazu Kitajo <maskit@apache.org>
Co-authored-by: Leif Hedstrom <zwoop@apache.org>
(cherry picked from commit c7ed799)
@zwoop zwoop modified the milestones: 10.0.0, 9.2.1 Mar 31, 2023
zwoop added a commit that referenced this pull request Apr 5, 2023
…evalidation when duplicate headers are present (#9527)"

This reverts commit c7ed799.

Instead, we'll do #9534
@zwoop zwoop modified the milestones: 9.2.1, 10.0.0 Apr 5, 2023
@zwoop
Copy link
Contributor

zwoop commented Apr 5, 2023

Reverted from 9.2.x, in favor of #9534

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants