-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Additional fixes for UTF-8 surrogate escapes #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer.
Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired
Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8.
Use a Unicode Surrogate that properly escapes in both Python2 and Python3
Updating test to pass once surrogatepass is used
This replicates behavior between Python 2 and Python 3
Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency.
This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
Hi @tylerjharden, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile? |
@karmi I believe your bot is mistaken. The e-mails match, but I added my work e-mail just to be safe. |
Thanks for the patch! |
* Fixes non UTF-8 surrogateescapes Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer. * Removes erroneous bytes decode and reraises Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired * Adds test for surrogate escapes in body Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8. * Use proper byte sequence for surrogate * Use if/else versus pass * Proper Unicode surrogate escape Use a Unicode Surrogate that properly escapes in both Python2 and Python3 * Passing test once surrogatepass is used Updating test to pass once surrogatepass is used * Use surrogatepass instead of backslashreplace This replicates behavior between Python 2 and Python 3 * Fixes whitespace * Simplifies with no exception block Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency. * Fixes Unicode Surrogate Escapes in request logging This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
* Fixes non UTF-8 surrogateescapes Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer. * Removes erroneous bytes decode and reraises Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired * Adds test for surrogate escapes in body Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8. * Use proper byte sequence for surrogate * Use if/else versus pass * Proper Unicode surrogate escape Use a Unicode Surrogate that properly escapes in both Python2 and Python3 * Passing test once surrogatepass is used Updating test to pass once surrogatepass is used * Use surrogatepass instead of backslashreplace This replicates behavior between Python 2 and Python 3 * Fixes whitespace * Simplifies with no exception block Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency. * Fixes Unicode Surrogate Escapes in request logging This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
My previously merged PR did not handle some additional spots in the code where similar encoding is done, namely success logging.