From 05551fb21435f9dac40789a70d5f0e0e7205193c Mon Sep 17 00:00:00 2001 From: Vedant Kumar Date: Wed, 6 May 2020 16:08:19 -0700 Subject: [PATCH] [lldb/test] Fix for flakiness in TestNSDictionarySynthetic MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Summary: TestNSDictionarySynthetic sets up an NSURL which does not initialize its _baseURL member. When the test runs and we print out the NSURL, we print out some garbage memory pointed-to by the _baseURL member, like: ``` _baseURL = 0x0800010020004029 @"d��qX" ``` and this can cause a python unicode decoding error like: ``` UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 10309: invalid start byte ``` There's a discrepancy here because lldb's StringPrinter facility tries to only print out "printable" sequences (see: isprint32()), whereas python rejects the StringPrinter output as invalid utf8. For the specific error seen above, lldb's `isprint32(0xa0) = true`, even though 0xa0 is not really "printable" in the usual sense. The problem is that lldb and python disagree on what exactly is "printable". Both have dismayingly hand-rolled utf8 validation code (c.f. _Py_DecodeUTF8Ex), and I can't really tell which one is more correct. I tried replacing lldb's isprint32() with a call to libc's iswprint(): this satisfied python, but broke emoji printing :|. Now, I believe that lldb (and python too) ought to just call into some battle-tested utf library, and that we shouldn't aim for compatibility with python's strict unicode decoding mode until then. FWIW I ran this test under an ASanified lldb hundreds of times but didn't turn up any other issues. rdar://62941711 Reviewers: JDevlieghere, jingham, shafik Subscribers: lldb-commits Tags: #lldb Differential Revision: https://reviews.llvm.org/D79645 (cherry picked from commit f807d0b4acdb70c5a15919f6e9b02d8b212d1088) --- lldb/test/API/lldbtest.py | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/lldb/test/API/lldbtest.py b/lldb/test/API/lldbtest.py index 7af4ea7f5b394b..520549e24a7749 100644 --- a/lldb/test/API/lldbtest.py +++ b/lldb/test/API/lldbtest.py @@ -100,9 +100,14 @@ def execute(self, test, litConfig): litConfig.maxIndividualTestTime) if sys.version_info.major == 2: - # In Python 2, string objects can contain Unicode characters. - out = out.decode('utf-8') - err = err.decode('utf-8') + # In Python 2, string objects can contain Unicode characters. Use + # the non-strict 'replace' decoding mode. We cannot use the strict + # mode right now because lldb's StringPrinter facility and the + # Python utf8 decoder have different interpretations of which + # characters are "printable". This leads to Python utf8 decoding + # exceptions even though lldb is behaving as expected. + out = out.decode('utf-8', 'replace') + err = err.decode('utf-8', 'replace') output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % ( ' '.join(cmd), exitCode)