Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytecode string returned when page has charset=UTF-8 #147

Closed
j0hnsmith opened this issue Aug 31, 2011 · 1 comment
Closed

bytecode string returned when page has charset=UTF-8 #147

j0hnsmith opened this issue Aug 31, 2011 · 1 comment

Comments

@j0hnsmith
Copy link

I had a situation (with both requests and urllib2) where a page that had <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> was being returned as a bytestring <type 'str'> but did contain unicode characters (due to a server misconfiguration I assume). So when I tried to use it I got the classic UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 419: ordinal not in range(128)

Is this something that requests could fix? Is this something that requests would want to fix?

@kennethreitz
Copy link
Contributor

Requests only attempts to decode charsets specified in HTTP Headers (in the upcoming release).

However, there is a utility function that will attempt to decode based on the HTML tags. If the content isn't actually in the specified encoding, however, there's nothing that can be done (aside from ignoring the invalid charecters).

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants