Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle HTTPX UTF-8 decoding errors #882

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

evantahler
Copy link

@evantahler evantahler commented Nov 14, 2024

Hello! Thank you for the vrc library - It makes testing our multi-service application /possible/.

In one of our tests, we want to ensure that application A can upload a file to application B and get some data back. We do something like this in our code:

files = {"file": ("my_file.pdf", open("my_file.pdf", "rb"))}

async with httpx.AsyncClient() as client:
  response = await client.post(
      "https://my-upload-service.com/api/post",
      json=request_payload,
      files=files,
  )

Recording this interaction with VCR throws an error because the PDF file in question can't be serialized to UTF8 without error, as it is a binary file

httpx_request = <Request('POST', 'http://parser:changeme123@localhost:8200/parser/api/v1/parse')>, kwargs = {}

    def _make_vcr_request(httpx_request, **kwargs):
>       body = httpx_request.read().decode("utf-8")
E       UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 154: invalid continuation byte

As all the existing vcr filters require the request to be parsed so that we can inspect the body/headers/etc, they won't help us here. The assumption that most requests are UTF-8 serializable makes perfect sense, and this is a bit of a weird edge case. So, I'd like to keep the existing behavior as much as possible, but in the case of a UnicodeDecodeError`, let's try parsing again, and drop any bytes that are causing trouble. In our case, it didn't make a meaningul difference to the cassette recording.

@evantahler
Copy link
Author

For the moment, we've gone with a monkeypatching approach:

import warnings

import vcr  # type: ignore[import-untyped]
from vcr.request import Request as VcrRequest  # type: ignore[import-untyped]
from vcr.stubs.httpx_stubs import (  # type: ignore
    _make_vcr_request,  # noqa: F401 this is needed for some reason so python knows this method exists
)


def _fixed__make_vcr_request(  # type: ignore
    httpx_request,
    **kwargs,  # noqa: ARG001
) -> VcrRequest:
    try:
        body = httpx_request.read().decode("utf-8")
    except UnicodeDecodeError as e:  # noqa: F841
        body = httpx_request.read().decode("utf-8", errors="ignore")
        warnings.warn(
            f"Could not decode full request payload as UTF8, recording may have lost bytes. {e}",
            stacklevel=2,
        )
    uri = str(httpx_request.url)
    headers = dict(httpx_request.headers)
    return VcrRequest(httpx_request.method, uri, body, headers)


vcr.stubs.httpx_stubs._make_vcr_request = _fixed__make_vcr_request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant