Add build environment without UTF-8 locales to travis-ci #289

wolfgangwalther · 2020-07-16T10:12:25Z

Preventing regressions like #285

coveralls · 2020-07-16T10:19:20Z

Coverage remained the same at 97.867% when pulling 6614e1a on wolfgangwalther:travis-no-utf8-locale-env into 8b758d4 on adrienverge:master.

adrienverge · 2020-07-16T10:27:31Z

Thanks, that's a good idea.

It shows interesting problems on Python >= 3, <= 3.6:

======================================================================
ERROR: setUpClass (tests.test_cli.CommandLineTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/adrienverge/yamllint/tests/test_cli.py", line 102, in setUpClass
    'en.yaml': '---\n'
  File "/home/travis/build/adrienverge/yamllint/tests/common.py", line 61, in build_temp_workspace
    if not os.path.exists(os.path.dirname(path)):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-46: ordinal not in range(128)

wolfgangwalther · 2020-07-16T10:29:12Z

Yes, I am looking at that right now as well. Not sure why it says line 102, but I would guess this is more about 89-94 - what do you think?

adrienverge · 2020-07-16T10:33:19Z

Yes, I am looking at that right now as well. Not sure why it says line 102, but I would guess this is more about 89-94 - what do you think?

I agree, line 102 looks OK. I guess line counting is altered by the right-to-left characters (الأَبْجَدِيَّة العَرَبِيَّة) on line 94, or by other high-unicode ones (お早う御座).

wolfgangwalther · 2020-07-16T10:34:33Z

Yes, that makes sense.

I assume we have to adjust setUpClass and test_find_files_recursively accordingly, if no UTF-8 locale is present. Not sure what the smartest way of doing that would be, yet. Any ideas?

…ilability

tests/test_cli.py

wolfgangwalther · 2020-07-16T11:11:26Z

So, I constructed those lists in test_find_files_recursively conditionally, depending on utf8 locale availability. I guess that's the only way to do it. Formatting that nicely is hard, though ;)

wolfgangwalther · 2020-07-16T11:15:37Z

Still not clear, though, why this is just a problem on some python versions and not on all of them, when no UTF-8 locale is available.

adrienverge

Sorry for the delay! I would rather be more precise on what Python versions are expected to fail (and be make sure everything works with Python ⩾ 3.7, even with a missing /usr/lib/locale/C.utf8), and keep tests simple and easy to read, by simply skipping them in the rare case when C.UTF-8 is not available.

Something like:

# Check system's UTF-8 availability, because without it using UTF-8 paths
# like 'éçäγλνπ¥' will break on Python ⩽ 3.6
def utf8_paths_supported():
    if sys.version_info >= (3, 7):
        return True
    try:
        locale.setlocale(locale.LC_ALL, 'C.UTF-8')
        locale.setlocale(locale.LC_ALL, (None, None))
        return True
    except locale.Error:
        return False


@unittest.skipIf(not utf8_paths_supported(), 'UTF-8 paths not supported')
class CommandLineTestCase(unittest.TestCase):

What do you think?

.travis.yml

tests/test_cli.py

Co-authored-by: Adrien Vergé <adrienverge@gmail.com>

wolfgangwalther · 2020-07-17T12:22:32Z

Skipping tests instead would be fine with me - if it was only the tests that are using UTF-8 paths.

@unittest.skipIf(not utf8_paths_supported(), 'UTF-8 paths not supported')
class CommandLineTestCase(unittest.TestCase):

Doesn't that basically skip the whole test_cli.py file? That would kind of defeat the purpose of this PR in the first place - because we would not test cli.run() at all for a system without UTF-8 locale (at least for some supported python versions). So this could in theory again lead to a similiar regression we had the first time with the locale stuff.

So I guess I would put the @unittest.skipIf decorator just on the relevant tests.

But thats definitely a lot cleaner than the current "solution" of patching all those arrays.

wolfgangwalther · 2020-07-17T12:38:58Z

tests/test_cli.py

+
+        if utf8_paths_supported():
+            # non-ASCII chars
+            workspace_def['non-ascii/éçäγλνπ¥/utf-8'] = (


I still kept this part in, because this way we only need to disable two out of ~25 tests

tests/test_cli.py

adrienverge

That would kind of defeat the purpose of this PR in the first place

Hmm, sure, you're right...

But I'm still uncomfortable with complexifying setUpClass() in two conditional parts with workspace_def = ....

I have another idea to execute the same code, whatever Python version, whatever locales installed: using bytes directly. What about this?

--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -86,12 +86,24 @@ class CommandLineTestCase(unittest.TestCase):
             'no-yaml.json': '---\n'
                             'key: value\n',
             # non-ASCII chars
-            'non-ascii/éçäγλνπ¥/utf-8': (
-                u'---\n'
-                u'- hétérogénéité\n'
-                u'# 19.99 €\n'
-                u'- お早う御座います。\n'
-                u'# الأَبْجَدِيَّة العَرَبِيَّة\n').encode('utf-8'),
+            # The following bytes work even on systems where C-UTF-8 is not
+            # available. They are the representation of:
+            #  'non-ascii/éçäγλνπ¥/utf-8':
+            #      u'---\n'
+            #      u'- hétérogénéité\n'
+            #      u'# 19.99 €\n'
+            #      u'- お早う御座います。\n'
+            #      u'# الأَبْجَدِيَّة العَرَبِيَّة\n'
+            u'non-ascii/éçäγλνπ¥/utf-8'.encode('utf-8'): (
+                b'---\n'
+                b'- h\xc3\xa9t\xc3\xa9rog\xc3\xa9n\xc3\xa9it\xc3\xa9\n'
+                b'# 19.99 \xe2\x82\xac\n'
+                b'- \xe3\x81\x8a\xe6\x97\xa9\xe3\x81\x86\xe5\xbe\xa1\xe5\xba'
+                b'\xa7\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99\xe3\x80\x82\n'
+                b'# \xd8\xa7\xd9\x84\xd8\xa3\xd9\x8e\xd8\xa8\xd9\x92\xd8\xac'
+                b'\xd9\x8e\xd8\xaf\xd9\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9 '
+                b'\xd8\xa7\xd9\x84\xd8\xb9\xd9\x8e\xd8\xb1\xd9\x8e\xd8\xa8\xd9'
+                b'\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9\n'),
             # dos line endings yaml
             'dos.yml': '---\r\n'
                        'dos: true',
--- a/tests/common.py
+++ b/tests/common.py
@@ -57,7 +57,8 @@ def build_temp_workspace(files):
     tempdir = tempfile.mkdtemp(prefix='yamllint-tests-')
 
     for path, content in files.items():
-        path = os.path.join(tempdir, path)
+        path = path if isinstance(path, bytes) else path.encode()
+        path = os.path.join(tempdir.encode(), path)
         if not os.path.exists(os.path.dirname(path)):
             os.makedirs(os.path.dirname(path))

(Then, we could still skip problematic tests, like your PR currently does.)

.travis.yml

tests/test_cli.py

wolfgangwalther · 2020-07-17T13:21:54Z

Are you sure the "bytes" stuff is actually going to work?

The original error message didn't complain about the .encode() stuff in setUpClass - but about os.path.exists in build_temp_workspace:

Traceback (most recent call last):
  File "/home/travis/build/adrienverge/yamllint/tests/test_cli.py", line 102, in setUpClass
    'en.yaml': '---\n'
  File "/home/travis/build/adrienverge/yamllint/tests/common.py", line 61, in build_temp_workspace
    if not os.path.exists(os.path.dirname(path)):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-46: ordinal not in range(128)

If I'm not mistaken we are passing a UTF-8 encoded string (which is perfectly supported by python, I'm pretty sure) to os.stat - where it probably get's encoded as ascii (otherwise I don't know where that mentioning of ascii comes from - but this part is probably dependent on the systems locales). Why would the bytes string you are putting in right now be any different?

Now, if that patch indeed works - can't we just convert the UTF-8 string to a byte string in build_temp_workspace? I don't think we need to touch setUpClass at all with all the single bytes in there.

Because let's be honest: setUpClass is in no way easier to read or understand in your suggestion. Probably a lot harder than before - at least for me. And it doesn't look nice as well ;)

wolfgangwalther · 2020-07-17T13:27:33Z

Maybe I'm misunderstanding what .encode('utf-8') does - but is there actually any difference between the following two?

            (
                u'---\n'
                u'- hétérogénéité\n'
                u'# 19.99 €\n'
                u'- お早う御座います。\n'
                u'# الأَبْجَدِيَّة العَرَبِيَّة\n').encode('utf-8')

and

            (
                b'---\n'
                b'- h\xc3\xa9t\xc3\xa9rog\xc3\xa9n\xc3\xa9it\xc3\xa9\n'
                b'# 19.99 \xe2\x82\xac\n'
                b'- \xe3\x81\x8a\xe6\x97\xa9\xe3\x81\x86\xe5\xbe\xa1\xe5\xba'
                b'\xa7\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99\xe3\x80\x82\n'
                b'# \xd8\xa7\xd9\x84\xd8\xa3\xd9\x8e\xd8\xa8\xd9\x92\xd8\xac'
                b'\xd9\x8e\xd8\xaf\xd9\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9 '
                b'\xd8\xa7\xd9\x84\xd8\xb9\xd9\x8e\xd8\xb1\xd9\x8e\xd8\xa8\xd9'
                b'\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9\n')

I think this is exactly what .encode does, right?

wolfgangwalther · 2020-07-17T14:09:34Z

Your patch also has this: u'non-ascii/éçäγλνπ¥/utf-8'.encode('utf-8') - so a .encode() on the key/path.

I put this into build_temp_workspace and was able to get it working locally. Let's see what the CI builds say, in all the different flavours.

wolfgangwalther · 2020-07-17T14:12:02Z

And - I had to revert the python version check in test_run_non_ascii_file. Setting the locale here seems to be important for the body of the file - not the path. And that seems to be independent of python versions.

wolfgangwalther · 2020-07-17T14:26:21Z

Putting both the version and locale check into def utf8_paths_supported() didn't work out. I changed that again. One of the two tests we are skipping now is skipped only depending on locale availability (test_run_non_ascii_file - we need the locale in any case) and the other test depends on either an available UTF-8 locale OR python >= 3.7 to run right now.

The change to build_temp_workspace seems to have broken the python 2.7 tests now, though.

adrienverge · 2020-07-18T09:17:47Z

I like this new version 👍

Are you sure the "bytes" stuff is actually going to work?

I tried and it seemed to work; maybe passing an already-encoded string to os.path avoids any encoding/decoding at the Python level?

The change to build_temp_workspace seems to have broken the python 2.7 tests now, though.

It's because test_cli.py passes "éçäγλνπ¥" to build_temp_workspace() as bytes instead of an Unicode string. With the following change, it should work:

             # non-ASCII chars
-            'non-ascii/éçäγλνπ¥/utf-8': (
+            u'non-ascii/éçäγλνπ¥/utf-8': (
                 u'---\n'

We're very close! :)

wolfgangwalther · 2020-07-20T10:37:02Z

I tried and it seemed to work; maybe passing an already-encoded string to os.path avoids any encoding/decoding at the Python level?

I think it worked - but not because of all the manually encoded bytes (as they were the content of the file), but only the .encode('utf-8') on the key you had in there as well! That's essentially what I am doing now as well.

Your suggestion about adding u'...' makes a lot of sense. Let's see whether that turns all of them green :)

If it does - I think we have found very good solution!

adrienverge

I think we have it 🎉
(Travis doesn't report the build status but it's all green, it's a known bug of theirs.)

Thanks a lot @wolfgangwalther for improving tests!

wolfgangwalther added 2 commits July 16, 2020 12:10

Add build environment without UTF-8 locales to travis-ci

03d26dc

Preventing regressions like #285

add sudo

a69ea53

fix remove_locales condition

b1a7384

adjust setUpClass and test_find_files_recursively for utf8 locale ava…

403d5c4

…ilability

wolfgangwalther commented Jul 16, 2020

View reviewed changes

tests/test_cli.py Outdated Show resolved Hide resolved

fix indent

7a389fe

adrienverge reviewed Jul 17, 2020

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

tests/test_cli.py Outdated Show resolved Hide resolved

Update tests/test_cli.py

0f095d8

Co-authored-by: Adrien Vergé <adrienverge@gmail.com>

refactor skipped tests

b657a3e

wolfgangwalther commented Jul 17, 2020

View reviewed changes

tests/test_cli.py Outdated Show resolved Hide resolved

adrienverge reviewed Jul 17, 2020

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

.travis.yml Show resolved Hide resolved

tests/test_cli.py Outdated Show resolved Hide resolved

fix build_temp_workspace

51cda6d

fix skipping tests

24e134f

lint

21237b7

fix python2.7 tests

6614e1a

adrienverge approved these changes Jul 20, 2020

View reviewed changes

adrienverge merged commit b80997e into adrienverge:master Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add build environment without UTF-8 locales to travis-ci #289

Add build environment without UTF-8 locales to travis-ci #289

wolfgangwalther commented Jul 16, 2020

coveralls commented Jul 16, 2020 •

edited

Loading

adrienverge commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

adrienverge commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

adrienverge left a comment

wolfgangwalther commented Jul 17, 2020 •

edited

Loading

wolfgangwalther Jul 17, 2020

adrienverge left a comment

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

adrienverge commented Jul 18, 2020

wolfgangwalther commented Jul 20, 2020

adrienverge left a comment

Add build environment without UTF-8 locales to travis-ci #289

Add build environment without UTF-8 locales to travis-ci #289

Conversation

wolfgangwalther commented Jul 16, 2020

coveralls commented Jul 16, 2020 • edited Loading

adrienverge commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

adrienverge commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

wolfgangwalther commented Jul 16, 2020

adrienverge left a comment

Choose a reason for hiding this comment

wolfgangwalther commented Jul 17, 2020 • edited Loading

wolfgangwalther Jul 17, 2020

Choose a reason for hiding this comment

adrienverge left a comment

Choose a reason for hiding this comment

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

wolfgangwalther commented Jul 17, 2020

adrienverge commented Jul 18, 2020

wolfgangwalther commented Jul 20, 2020

adrienverge left a comment

Choose a reason for hiding this comment

coveralls commented Jul 16, 2020 •

edited

Loading

wolfgangwalther commented Jul 17, 2020 •

edited

Loading