-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encodings from content #1087
Comments
At first I thought this was an issue with charade, but it seems you're right that this is taking place in requests proper. From what I can tell, The Verge might not be setting the encoding in a way requests expects to see it. Could you post the headers from your response for better debugging? (Alternatively, post the URL you used.) For the sake of satiating my curiousity, could you also use: |
@sigmavirus24 I noticed the issue with the title of this article: http://www.theverge.com/2013/1/4/3836944/robot-band-compressorhead-plays-motorhead-ace-of-spades a few days ago, it was reporting an encoding of
|
Hm. I'll try to reproduce it later. The server just could have been Thanks for following up though, it's very helpful |
We do not parse HTML. We did in the past. That's why the function exists, for those who feel like they need it. |
@kennethreitz understood after reviewing your comments in #156. A mention of the function in the docs would be useful. |
Requests has a
get_encodings_from_content()
function, but it doesn't seem to be used anywhere -- onlyget_encoding_from_headers()
is used.Any reason why? I'd think on most pages, trying for the meta tag encoding declaration first will produce better results. See The Verge, for one example (the meta tag declares encoding as utf-8, but Requests detects as ISO-8895-1).
The text was updated successfully, but these errors were encountered: