Encodings from content #1087

akavlie · 2013-01-05T01:07:11Z

Requests has a get_encodings_from_content() function, but it doesn't seem to be used anywhere -- only get_encoding_from_headers() is used.

Any reason why? I'd think on most pages, trying for the meta tag encoding declaration first will produce better results. See The Verge, for one example (the meta tag declares encoding as utf-8, but Requests detects as ISO-8895-1).

The text was updated successfully, but these errors were encountered:

sigmavirus24 · 2013-01-05T15:52:21Z

At first I thought this was an issue with charade, but it seems you're right that this is taking place in requests proper. From what I can tell, The Verge might not be setting the encoding in a way requests expects to see it. Could you post the headers from your response for better debugging? (Alternatively, post the URL you used.) For the sake of satiating my curiousity, could you also use: apparent_encoding and see what that returns? (It is an attribute, not a method)

piotr-dobrogost · 2013-01-06T18:17:02Z

Related #156, especially this comment.

akavlie · 2013-01-08T03:50:00Z

@sigmavirus24 I noticed the issue with the title of this article:

http://www.theverge.com/2013/1/4/3836944/robot-band-compressorhead-plays-motorhead-ace-of-spades

a few days ago, it was reporting an encoding of iso-8859-1. I don't see the issue now, however. It would be a little weird if they just happened to fix the issue since I last looked... but I don't have a better explanation. Content-Type is now:

Content-Type: text/html; charset=utf-8

sigmavirus24 · 2013-01-08T03:52:39Z

Hm. I'll try to reproduce it later. The server just could have been
misbehaving and they identified and fixed it. shrug

Thanks for following up though, it's very helpful

kennethreitz · 2013-01-10T06:56:54Z

We do not parse HTML. We did in the past. That's why the function exists, for those who feel like they need it.

akavlie · 2013-01-10T08:21:12Z

@kennethreitz understood after reviewing your comments in #156. A mention of the function in the docs would be useful.

kennethreitz closed this as completed Jan 10, 2013

sigmavirus24 mentioned this issue Jan 29, 2013

On some pages requests detect encoding incorrectly #1150

Closed

itsadok mentioned this issue Nov 14, 2013

[Suggestion] Simplify charset handling #1737

Open

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encodings from content #1087

Encodings from content #1087

akavlie commented Jan 5, 2013

sigmavirus24 commented Jan 5, 2013

piotr-dobrogost commented Jan 6, 2013

akavlie commented Jan 8, 2013

sigmavirus24 commented Jan 8, 2013

kennethreitz commented Jan 10, 2013

akavlie commented Jan 10, 2013

Encodings from content #1087

Encodings from content #1087

Comments

akavlie commented Jan 5, 2013

sigmavirus24 commented Jan 5, 2013

piotr-dobrogost commented Jan 6, 2013

akavlie commented Jan 8, 2013

sigmavirus24 commented Jan 8, 2013

kennethreitz commented Jan 10, 2013

akavlie commented Jan 10, 2013