Additional fixes for UTF-8 surrogate escapes #629

tylerjharden · 2017-08-08T15:36:31Z

My previously merged PR did not handle some additional spots in the code where similar encoding is done, namely success logging.

Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer.

Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired

Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8.

Use a Unicode Surrogate that properly escapes in both Python2 and Python3

Updating test to pass once surrogatepass is used

This replicates behavior between Python 2 and Python 3

Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency.

This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.

karmi · 2017-08-08T15:36:35Z

Hi @tylerjharden, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

tylerjharden · 2017-08-09T19:50:23Z

@karmi I believe your bot is mistaken. The e-mails match, but I added my work e-mail just to be safe.

honzakral · 2017-08-15T14:41:39Z

Thanks for the patch!

* Fixes non UTF-8 surrogateescapes Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer. * Removes erroneous bytes decode and reraises Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired * Adds test for surrogate escapes in body Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8. * Use proper byte sequence for surrogate * Use if/else versus pass * Proper Unicode surrogate escape Use a Unicode Surrogate that properly escapes in both Python2 and Python3 * Passing test once surrogatepass is used Updating test to pass once surrogatepass is used * Use surrogatepass instead of backslashreplace This replicates behavior between Python 2 and Python 3 * Fixes whitespace * Simplifies with no exception block Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency. * Fixes Unicode Surrogate Escapes in request logging This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.

tylerjharden and others added 12 commits June 30, 2017 11:14

Fixes non UTF-8 surrogateescapes

61d39d8

Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer.

Removes erroneous bytes decode and reraises

0470858

Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired

Adds test for surrogate escapes in body

c6fa87b

Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8.

Use proper byte sequence for surrogate

1d8c0e9

Use if/else versus pass

4fa7023

Proper Unicode surrogate escape

b90231f

Use a Unicode Surrogate that properly escapes in both Python2 and Python3

Passing test once surrogatepass is used

cf0672d

Updating test to pass once surrogatepass is used

Use surrogatepass instead of backslashreplace

6259cb3

This replicates behavior between Python 2 and Python 3

Fixes whitespace

05c2b0a

Simplifies with no exception block

41fb145

Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency.

Merge branch 'master' of https://github.com/elastic/elasticsearch-py

8565da6

Fixes Unicode Surrogate Escapes in request logging

1b9c02f

This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.

hellysmile mentioned this pull request Aug 8, 2017

Check upstream additional surrogate fixes aio-libs/aioelasticsearch#34

Closed

honzakral merged commit cd769b9 into elastic:master Aug 15, 2017

vEpiphyte mentioned this pull request Apr 20, 2020

Missing surrogatepass in urllib3 handler #1212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Additional fixes for UTF-8 surrogate escapes #629

Additional fixes for UTF-8 surrogate escapes #629

Uh oh!

tylerjharden commented Aug 8, 2017

Uh oh!

karmi commented Aug 8, 2017

Uh oh!

tylerjharden commented Aug 9, 2017

Uh oh!

honzakral commented Aug 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Additional fixes for UTF-8 surrogate escapes #629

Additional fixes for UTF-8 surrogate escapes #629

Uh oh!

Conversation

tylerjharden commented Aug 8, 2017

Uh oh!

karmi commented Aug 8, 2017

Uh oh!

tylerjharden commented Aug 9, 2017

Uh oh!

honzakral commented Aug 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants