Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Trailing Whitespace Handling in HTTP Headers #3429

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Nirab123456
Copy link

Screenshot of Test Results

Updated Code

new_11_test_with_time

Original Code

original_fail

Test: test_parse_line_with_trailing_spaces

def test_parse_line_with_trailing_spaces(self):
    headers = HTTPHeaders()

    # Test header with multiple trailing spaces
    headers.parse_line("Content-Length: 0   ")
    self.assertEqual(headers.get('content-length'), '0')

    # Test header with leading and trailing spaces
    headers.parse_line(" Content-Length :  123  ")
    self.assertEqual(headers.get('content-length'), '123')  # Ensure value is overwritten

    # Test continuation line with trailing spaces
    headers.parse_line("Content-Length: 42")
    headers.parse_line("  ")  # Test multi-line continuation
    self.assertEqual(headers.get('content-length'), '42')  # Ensure spaces don't affect value

Description:

The modifications made to the parse_line method in the HTTPHeaders class effectively resolve the issue of improper handling of trailing and leading whitespace in HTTP headers, specifically addressing a critical edge case involving the Content-Length header. This issue was highlighted in a GitHub discussion (#3321), where it was noted that trailing spaces in the header value could lead to errors during processing, especially when Content-Length is the last header in a request.

How the Code Solves the Issue:

  1. Robust Whitespace Management:

    • The new implementation of parse_line removes leading and trailing whitespace from the header line using line.strip(). This ensures that any extra spaces do not affect the stored value, thus preventing potential formatting issues when accessing the header later.
    • The use of strip() prevents cases like Content-Length: 0 from storing an unintended value with trailing spaces.
  2. Specific Handling of Content-Length:

    • The code explicitly checks for the Content-Length header and ensures that the value after the colon is correctly extracted and stripped of whitespace. For example, parsing Content-Length : 123 would result in the correct value of '123', completely ignoring any spaces.
    • This targeted handling avoids common pitfalls such as retaining leading spaces (e.g., 42 from a continuation line) or appending unintended spaces due to continuation line logic.
  3. Continuation Line Handling:

    • The revised implementation also addresses the issue of continuation lines. If a continuation line is encountered (indicating an extension of a previous header), the new code checks for non-empty values before appending them. This prevents scenarios where an empty continuation line could erroneously append whitespace to the previous value, maintaining the integrity of the header data.
    • For example, when parsing a continuation line like " ", the new logic ensures that this line is effectively ignored, thus maintaining the integrity of the previously stored header value.
  4. Error Prevention:

    • By ensuring that values are stripped of whitespace and validating conditions before updating header values, the code reduces the risk of ValueError occurrences that arise from malformed headers. This makes the header processing more resilient and less prone to user errors in HTTP requests.

Conclusion:

The enhancements made in the parse_line method significantly improve the handling of HTTP headers, specifically addressing the issues related to trailing and leading whitespace in header values. By implementing a robust approach to whitespace management and providing specific handling for critical headers like Content-Length, the code mitigates potential parsing errors and ensures adherence to HTTP standards, thereby improving overall functionality and reliability. This change effectively resolves the edge cases discussed in the issue, enhancing the usability of the HTTPHeaders class in real-world applications.

@Nirab123456
Copy link
Author

@bdarnell can you please review my tests ??

Copy link
Member

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different approach than I described in #3321 (comment). Why?

My plan for testing this was to start with HTTPHeadersTest.test_multi_line to add more continuation line cases and ensuring that the final \r\n\r\n is right. Unit tests for parse_line are good too, though.

tornado/test/httputil_test.py Outdated Show resolved Hide resolved
tornado/httputil.py Show resolved Hide resolved
tornado/httputil.py Outdated Show resolved Hide resolved
@Nirab123456
Copy link
Author

Nirab123456 commented Oct 25, 2024

@bdarnell In response to: Discussion & Pull Request Review

I have updated the handling of the Content-Length header to enforce stricter validation rules, ensuring compliance with RFC 7230 guidelines. The prior approach in parse_line allowed headers to be updated without rigorous validation, which could lead to compliance issues. Given this, I have reverted the previous changes to parse_line and introduced stricter validation.

In this revised implementation:

  • Any malformed, negative, or conflicting Content-Length headers will now raise an exception. This prevents incorrect or potentially unsafe values from being processed, ensuring the header's value complies with HTTP/1.1 standards.

This update resolves several shortcomings in Tornado’s prior implementation by correctly handling edge cases that were previously unaddressed. Specifically, the original code had the following issues in tests:

  • test_multiple_content_length_headers: Allowed multiple Content-Length headers with differing values without raising an error.
  • test_invalid_content_length: Failed to raise an exception for non-integer values in Content-Length.
  • test_negative_content_length: Did not handle negative Content-Length values appropriately.

The updated code now adheres to RFC 7230 specifications, ensuring these cases are handled correctly and improving Tornado's robustness.

Errors in the Original Code:

  1. test_multiple_content_length_headers:

    • The test expected a single value "123", but the code returned "123,123".
    • Error message:
      AssertionError: '123,123' != '123'
      - 123,123
      + 123
      
  2. test_invalid_content_length:

    • The code did not raise an exception for an invalid Content-Length value like "abc".
    • Error message:
      AssertionError: HTTPInputError not raised
      
  3. test_negative_content_length:

    • The code failed to raise an error for a negative Content-Length value.
    • Error message:
      AssertionError: HTTPInputError not raised
      

Updated Test Code:

    def test_multiple_content_length_headers(self):
        headers = HTTPHeaders()
        headers.parse_line("Content-Length: 123")
        headers.parse_line("Content-Length: 123")
        self.assertEqual(headers.get("content-length"), "123")
        with self.assertRaises(HTTPInputError):
            headers.parse_line("Content-Length: 456")  # Should raise an error due to conflicting values

    def test_invalid_content_length(self):
        headers = HTTPHeaders()
        with self.assertRaises(HTTPInputError):
            headers.parse_line("Content-Length: abc")  # Should raise an error due to non-integer value

    def test_negative_content_length(self):
        headers = HTTPHeaders()
        with self.assertRaises(HTTPInputError):
            headers.parse_line("Content-Length: -123")  # Should raise an error due to negative value

    def test_leading_trailing_whitespace(self):
        headers = HTTPHeaders()
        headers.parse_line("Content-Length: 123   ")
        self.assertEqual(headers.get('content-length'), '123')  # Should handle trailing whitespace correctly

    def test_zero_content_length(self):
        headers = HTTPHeaders()
        headers.parse_line("Content-Length: 0")
        self.assertEqual(headers.get('content-length'), '0')  # Should correctly handle zero

Context for Content-Length Validation

According to RFC 7230, HTTP/1.1 specifies that:

  1. Header Field Name Syntax: Field names must consist of valid tokens without any leading, trailing, or internal whitespace:

    field-name = token
    token      = 1*tchar
    
  2. Whitespace in Headers: Whitespace is allowed only after the colon separating the field name and value. Leading whitespace in field names (e.g., " Content-Length: 123") renders the header invalid under HTTP/1.1.

  3. Specific Rules for Content-Length: Content-Length values must be numeric and must not have any whitespace in the field name itself.

This means Tornado's current behavior in parse_line—which raises an error on encountering leading whitespace in the field name—is both expected and correctly handled:

    def test_space_in_content_length_key(self):
        headers = HTTPHeaders()
        with self.assertRaises(HTTPInputError):
            headers.parse_line(" Content-Length: 1")  # Invalid due to leading space in the key

Updated Code

re-test-tornado-mycode

Original Code

re-test-tornado-original

@Nirab123456 Nirab123456 requested a review from bdarnell October 25, 2024 20:24
@bdarnell
Copy link
Member

bdarnell commented Dec 7, 2024

Instead, in the add method, I will not allow the addition of a Content-Length header unless it meets the necessary criteria.

I don't think HTTPHeaders.add is the right place to enforce this - HTTPHeaders is more of a dumb container, and semantic issues are covered in HTTP1Connection (If we do start enforcing this in HTTPHeaders, the redundant checks in HTTP1Connection should probably be removed).

In any case, I may be getting mixed up but I don't know if modifying add() without parse_line() actually solves the problem from #3321. When continuation lines are used, parse_line calls add() on the first line (which contains only digits), then modifies the internal dictionary directly on the second line (which contains only whitespace). This bypasses the checks in add().

This is a different approach than I described in #3321 (comment). Why?

I reiterate my earlier comment, which you didn't answer. This change leaves the problematic handling of whitespace in continuation lines alone, which can be a problem for more headers than content-length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants