-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding detection can lead to LookupError
#2732
Comments
I think mentioning |
Sometimes, cchardet might detect encodings that Python doesn't know. In such cases, `.text()` function might raise a `LookupError`, and `get_encoding` may return values that are unsafe to pass to `bytes.decode()` or to `.text()` functions. Closes aio-libs#2732
@asvetlov I agree, but that seems inconsistent with the current behavior. If the client passes |
Sometimes, cchardet might detect encodings that Python doesn't know. In such cases, `.text()` function might raise a `LookupError`, and `get_encoding` may return values that are unsafe to pass to `bytes.decode()` or to `.text()` functions. Closes #2732
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs. |
Long story short
Some encoding detected by cchardet are unsupported by python (e.g.
VISCII
). This makestext()
function raise aLookupError
when such encoding is detected, even whenerrors
parameter is set to'ignore'
(which I would have assumed to be safe).Expected behaviour
Not sure about this. Maybe
get_encoding()
shouldcodecs.lookup
for the detected encoding to ensure it is known, and if it isn't, fallback to UTF-8. Or document properly thattext()
function might raiseLookupError
or thatget_encoding()
result is not safe to pass totext(encoding)
or.decode(encoding)
directly.Actual behaviour
LookupError
is thrown.Steps to reproduce
Your environment
The text was updated successfully, but these errors were encountered: