Skip to content

Commit cd769b9

Browse files
tylerjhardenhonzakral
authored andcommitted
Additional fixes for UTF-8 surrogate escapes (#629)
* Fixes non UTF-8 surrogateescapes Surrogate escapes in Unicode (non UTF-8 encoding) will be properly escaped with backslashes when encountered, versus breaking the transport layer. * Removes erroneous bytes decode and reraises Fixes to re-raise exceptions with different reasons Removes erroneous bytes decode where bytes are desired * Adds test for surrogate escapes in body Tests that a surrogate escape sequence is properly escaped with backslashes to produce valid UTF-8. * Use proper byte sequence for surrogate * Use if/else versus pass * Proper Unicode surrogate escape Use a Unicode Surrogate that properly escapes in both Python2 and Python3 * Passing test once surrogatepass is used Updating test to pass once surrogatepass is used * Use surrogatepass instead of backslashreplace This replicates behavior between Python 2 and Python 3 * Fixes whitespace * Simplifies with no exception block Since `surrogatepass` will only ever explicitly occur when there are surrogate bytes encountered, there is no need to let the error throw and catch it, also uses single-quotes for consistency. * Fixes Unicode Surrogate Escapes in request logging This is the same fix as accepted for ElasticSearch requests, except applied to the request logging mechanisms.
1 parent 843ce9e commit cd769b9

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

elasticsearch/connection/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def log_request_success(self, method, full_url, path, body, status_code, respons
7676
# body has already been serialized to utf-8, deserialize it for logging
7777
# TODO: find a better way to avoid (de)encoding the body back and forth
7878
if body:
79-
body = body.decode('utf-8')
79+
body = body.decode('utf-8', 'ignore')
8080

8181
logger.info(
8282
'%s %s [status:%s request:%.3fs]', method, full_url,
@@ -100,7 +100,7 @@ def log_request_fail(self, method, full_url, path, body, duration, status_code=N
100100
# body has already been serialized to utf-8, deserialize it for logging
101101
# TODO: find a better way to avoid (de)encoding the body back and forth
102102
if body:
103-
body = body.decode('utf-8')
103+
body = body.decode('utf-8', 'ignore')
104104

105105
logger.debug('> %s', body)
106106

0 commit comments

Comments
 (0)