Skip to content

Additional fixes for UTF-8 surrogate escapes #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 15, 2017

Conversation

tylerjharden
Copy link
Contributor

My previously merged PR did not handle some additional spots in the code where similar encoding is done, namely success logging.

tylerjharden and others added 12 commits June 30, 2017 11:14
Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer.
Fixes to re-raise exceptions with different reasons
Removes erroneous bytes decode where bytes are desired
Tests that a surrogate escape sequence is properly escaped
with backslashes to produce valid UTF-8.
Use a Unicode Surrogate that properly escapes in both Python2 and Python3
Updating test to pass once surrogatepass is used
This replicates behavior between Python 2 and Python 3
Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency.
This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
@karmi
Copy link

karmi commented Aug 8, 2017

Hi @tylerjharden, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@tylerjharden
Copy link
Contributor Author

@karmi I believe your bot is mistaken. The e-mails match, but I added my work e-mail just to be safe.

@honzakral honzakral merged commit cd769b9 into elastic:master Aug 15, 2017
@honzakral
Copy link
Contributor

Thanks for the patch!

honzakral pushed a commit that referenced this pull request Aug 15, 2017
* Fixes non UTF-8 surrogateescapes 

Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer.

* Removes erroneous bytes decode and reraises

Fixes to re-raise exceptions with different reasons
Removes erroneous bytes decode where bytes are desired

* Adds test for surrogate escapes in body

Tests that a surrogate escape sequence is properly escaped
with backslashes to produce valid UTF-8.

* Use proper byte sequence for surrogate

* Use if/else versus pass

* Proper Unicode surrogate escape

Use a Unicode Surrogate that properly escapes in both Python2 and Python3

* Passing test once surrogatepass is used

Updating test to pass once surrogatepass is used

* Use surrogatepass instead of backslashreplace

This replicates behavior between Python 2 and Python 3

* Fixes whitespace

* Simplifies with no exception block

Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency.

* Fixes Unicode Surrogate Escapes in request logging

This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
rciorba added a commit to rciorba/elasticsearch-py that referenced this pull request Mar 2, 2018
* Fixes non UTF-8 surrogateescapes 

Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer.

* Removes erroneous bytes decode and reraises

Fixes to re-raise exceptions with different reasons
Removes erroneous bytes decode where bytes are desired

* Adds test for surrogate escapes in body

Tests that a surrogate escape sequence is properly escaped
with backslashes to produce valid UTF-8.

* Use proper byte sequence for surrogate

* Use if/else versus pass

* Proper Unicode surrogate escape

Use a Unicode Surrogate that properly escapes in both Python2 and Python3

* Passing test once surrogatepass is used

Updating test to pass once surrogatepass is used

* Use surrogatepass instead of backslashreplace

This replicates behavior between Python 2 and Python 3

* Fixes whitespace

* Simplifies with no exception block

Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency.

* Fixes Unicode Surrogate Escapes in request logging

This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants