Making charset auto-detection strictly opt-in. #2083
Unanswered
tomchristie
asked this question in
Ideas
Replies: 1 comment
-
Looks like this recently landed (in May 2022): |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So, I've been thinking for a while about making our charset auto-detection strictly opt-in.
Our charset auto-detection is used for cases where
response.text
is accessed, but nocharset
is present in the responseContent-Type
.Right now we'll fallback to using
charset_normalizer
in that case in order to auto-detect an encoding. Which is a bit of a mixed bag. It's a bit of a fuzzy approach, and I'm not overly keen on it. We've triedutf-8
only as the fallback in the past, whichSo, here's an alternative.
Rather than having an
apparent_encoding
on theRequest
class, which is used as the fallback, I'd suggest the following for charset control...charset_fallback="utf-8"
# Default to utf-8 as the fallback.charset_errors="replace"
# Default to the lenient "replace" for decoding failures.We'd have those on the
Request()
model, and on theClient()
model, so for instance...Or...
However we would still like to support auto-detection for the fallback, but make it strictly opt-in.
We'd do that by having
charset_normalizer
as a regular installable codec...Which then allows...
Or...
Related: #1018, #1269, #1657, #1791
Beta Was this translation helpful? Give feedback.
All reactions