You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A webpage has been detcected as being ISO-8859-1 encoded, even though it is encoded in utf-8.
Expected Result
Correct classification as utf-8.
Actual Result
utf-8 page detected as ISO-8859-1.
Reproduction Steps
#!/usr/bin/python
import requests
# example url
url = "https://digitalezivilgesellschaft.org/"
# get the page and print the supposed encoding
response = requests.get(url)
print(response.encoding)
Hi @klartext, if a webpage fails to specify its encoding, we'll default to ISO-8859-1 which has been the spec since RFC 7230. This is starting to change, but we're in an inconsistent state for now. We've kept the functionality for backwards compatibility and it still seems to be the correct case more often.
To resolve this issue you can either do as @GoddessLuBoYan suggests above, or set the encoding attribute on your Response object before calling text. This will do the decoding for you.
A webpage has been detcected as being ISO-8859-1 encoded, even though it is encoded in utf-8.
Expected Result
Correct classification as utf-8.
Actual Result
utf-8 page detected as ISO-8859-1.
Reproduction Steps
Compare that with
System Information
This concrete problem seems to be related to the more general issue
#2086
The text was updated successfully, but these errors were encountered: