-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for response with UTF-8 BOM #4976
Fix for response with UTF-8 BOM #4976
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4976 +/- ##
==========================================
+ Coverage 66.89% 66.94% +0.04%
==========================================
Files 15 15
Lines 1577 1579 +2
==========================================
+ Hits 1055 1057 +2
Misses 522 522
Continue to review full report at Codecov.
|
So I'm not certain this is actually the right place to fix this. I need to do more research to figure out how backwards compatible this actually is. As a work-around, one can forcibly override |
@sigmavirus24 , thanks! We are using the approach you mentioned in the msrest library, but I believe other libraries will experience this issue. The issue being that RFC 7159 does not allow BOM and |
RFC 7159 does not allow BOM's on UTF8, correct. How does that apply to XML? |
Aha, to quote the codecs documentation:
The important part here is that |
It doesn't apply directly. But, if you take a look into the test log from my branch (https://travis-ci.com/eduardomourar/requests/jobs/176565327), an error will be thrown when running |
Regarding backward compatibility, I believe we are covered too. The code will influence as little as possible because we are not replacing |
@sigmavirus24 , is there anything needed here from my side? |
@kennethreitz, @sigmavirus24 and @nateprewitt , could you help with this PR, please? |
fantastic work! |
As per discussion in the PR Azure/msrest-for-python/#145, there some issues with server responses (specially from Microsoft) that have BOM in it. You can see errors in the tests from my branch here reproducing the same behavior when trying to parse both text and JSON.
This has been fixed by forcing encoding to
utf-8-sig
when HTTP header has signalizedutf-8
or leave to chardet when no encoding has been identified. That way the parsing works as expected and no errors are thrown.