Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build environment without UTF-8 locales to travis-ci #289

Merged
merged 11 commits into from
Jul 20, 2020
Merged

Add build environment without UTF-8 locales to travis-ci #289

merged 11 commits into from
Jul 20, 2020

Conversation

wolfgangwalther
Copy link
Contributor

Preventing regressions like #285

@coveralls
Copy link

coveralls commented Jul 16, 2020

Coverage Status

Coverage remained the same at 97.867% when pulling 6614e1a on wolfgangwalther:travis-no-utf8-locale-env into 8b758d4 on adrienverge:master.

@adrienverge
Copy link
Owner

Thanks, that's a good idea.

It shows interesting problems on Python >= 3, <= 3.6:

======================================================================
ERROR: setUpClass (tests.test_cli.CommandLineTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/adrienverge/yamllint/tests/test_cli.py", line 102, in setUpClass
    'en.yaml': '---\n'
  File "/home/travis/build/adrienverge/yamllint/tests/common.py", line 61, in build_temp_workspace
    if not os.path.exists(os.path.dirname(path)):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-46: ordinal not in range(128)

@wolfgangwalther
Copy link
Contributor Author

Yes, I am looking at that right now as well. Not sure why it says line 102, but I would guess this is more about 89-94 - what do you think?

@adrienverge
Copy link
Owner

Yes, I am looking at that right now as well. Not sure why it says line 102, but I would guess this is more about 89-94 - what do you think?

I agree, line 102 looks OK. I guess line counting is altered by the right-to-left characters (الأَبْجَدِيَّة العَرَبِيَّة) on line 94, or by other high-unicode ones (お早う御座).

@wolfgangwalther
Copy link
Contributor Author

Yes, that makes sense.

I assume we have to adjust setUpClass and test_find_files_recursively accordingly, if no UTF-8 locale is present. Not sure what the smartest way of doing that would be, yet. Any ideas?

tests/test_cli.py Outdated Show resolved Hide resolved
@wolfgangwalther
Copy link
Contributor Author

So, I constructed those lists in test_find_files_recursively conditionally, depending on utf8 locale availability. I guess that's the only way to do it. Formatting that nicely is hard, though ;)

@wolfgangwalther
Copy link
Contributor Author

Still not clear, though, why this is just a problem on some python versions and not on all of them, when no UTF-8 locale is available.

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I would rather be more precise on what Python versions are expected to fail (and be make sure everything works with Python ⩾ 3.7, even with a missing /usr/lib/locale/C.utf8), and keep tests simple and easy to read, by simply skipping them in the rare case when C.UTF-8 is not available.

Something like:

# Check system's UTF-8 availability, because without it using UTF-8 paths
# like 'éçäγλνπ¥' will break on Python ⩽ 3.6
def utf8_paths_supported():
    if sys.version_info >= (3, 7):
        return True
    try:
        locale.setlocale(locale.LC_ALL, 'C.UTF-8')
        locale.setlocale(locale.LC_ALL, (None, None))
        return True
    except locale.Error:
        return False


@unittest.skipIf(not utf8_paths_supported(), 'UTF-8 paths not supported')
class CommandLineTestCase(unittest.TestCase):

What do you think?

.travis.yml Outdated Show resolved Hide resolved
tests/test_cli.py Outdated Show resolved Hide resolved
Co-authored-by: Adrien Vergé <adrienverge@gmail.com>
@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Jul 17, 2020

Skipping tests instead would be fine with me - if it was only the tests that are using UTF-8 paths.

@unittest.skipIf(not utf8_paths_supported(), 'UTF-8 paths not supported')
class CommandLineTestCase(unittest.TestCase):

Doesn't that basically skip the whole test_cli.py file? That would kind of defeat the purpose of this PR in the first place - because we would not test cli.run() at all for a system without UTF-8 locale (at least for some supported python versions). So this could in theory again lead to a similiar regression we had the first time with the locale stuff.

So I guess I would put the @unittest.skipIf decorator just on the relevant tests.

But thats definitely a lot cleaner than the current "solution" of patching all those arrays.


if utf8_paths_supported():
# non-ASCII chars
workspace_def['non-ascii/éçäγλνπ¥/utf-8'] = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kept this part in, because this way we only need to disable two out of ~25 tests

tests/test_cli.py Outdated Show resolved Hide resolved
Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would kind of defeat the purpose of this PR in the first place

Hmm, sure, you're right...

But I'm still uncomfortable with complexifying setUpClass() in two conditional parts with workspace_def = ....

I have another idea to execute the same code, whatever Python version, whatever locales installed: using bytes directly. What about this?

--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -86,12 +86,24 @@ class CommandLineTestCase(unittest.TestCase):
             'no-yaml.json': '---\n'
                             'key: value\n',
             # non-ASCII chars
-            'non-ascii/éçäγλνπ¥/utf-8': (
-                u'---\n'
-                u'- hétérogénéité\n'
-                u'# 19.99 €\n'
-                u'- お早う御座います。\n'
-                u'# الأَبْجَدِيَّة العَرَبِيَّة\n').encode('utf-8'),
+            # The following bytes work even on systems where C-UTF-8 is not
+            # available. They are the representation of:
+            #  'non-ascii/éçäγλνπ¥/utf-8':
+            #      u'---\n'
+            #      u'- hétérogénéité\n'
+            #      u'# 19.99 €\n'
+            #      u'- お早う御座います。\n'
+            #      u'# الأَبْجَدِيَّة العَرَبِيَّة\n'
+            u'non-ascii/éçäγλνπ¥/utf-8'.encode('utf-8'): (
+                b'---\n'
+                b'- h\xc3\xa9t\xc3\xa9rog\xc3\xa9n\xc3\xa9it\xc3\xa9\n'
+                b'# 19.99 \xe2\x82\xac\n'
+                b'- \xe3\x81\x8a\xe6\x97\xa9\xe3\x81\x86\xe5\xbe\xa1\xe5\xba'
+                b'\xa7\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99\xe3\x80\x82\n'
+                b'# \xd8\xa7\xd9\x84\xd8\xa3\xd9\x8e\xd8\xa8\xd9\x92\xd8\xac'
+                b'\xd9\x8e\xd8\xaf\xd9\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9 '
+                b'\xd8\xa7\xd9\x84\xd8\xb9\xd9\x8e\xd8\xb1\xd9\x8e\xd8\xa8\xd9'
+                b'\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9\n'),
             # dos line endings yaml
             'dos.yml': '---\r\n'
                        'dos: true',
--- a/tests/common.py
+++ b/tests/common.py
@@ -57,7 +57,8 @@ def build_temp_workspace(files):
     tempdir = tempfile.mkdtemp(prefix='yamllint-tests-')
 
     for path, content in files.items():
-        path = os.path.join(tempdir, path)
+        path = path if isinstance(path, bytes) else path.encode()
+        path = os.path.join(tempdir.encode(), path)
         if not os.path.exists(os.path.dirname(path)):
             os.makedirs(os.path.dirname(path))
 

(Then, we could still skip problematic tests, like your PR currently does.)

.travis.yml Outdated Show resolved Hide resolved
.travis.yml Show resolved Hide resolved
tests/test_cli.py Outdated Show resolved Hide resolved
@wolfgangwalther
Copy link
Contributor Author

Are you sure the "bytes" stuff is actually going to work?

The original error message didn't complain about the .encode() stuff in setUpClass - but about os.path.exists in build_temp_workspace:

Traceback (most recent call last):
  File "/home/travis/build/adrienverge/yamllint/tests/test_cli.py", line 102, in setUpClass
    'en.yaml': '---\n'
  File "/home/travis/build/adrienverge/yamllint/tests/common.py", line 61, in build_temp_workspace
    if not os.path.exists(os.path.dirname(path)):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-46: ordinal not in range(128)

If I'm not mistaken we are passing a UTF-8 encoded string (which is perfectly supported by python, I'm pretty sure) to os.stat - where it probably get's encoded as ascii (otherwise I don't know where that mentioning of ascii comes from - but this part is probably dependent on the systems locales). Why would the bytes string you are putting in right now be any different?

Now, if that patch indeed works - can't we just convert the UTF-8 string to a byte string in build_temp_workspace? I don't think we need to touch setUpClass at all with all the single bytes in there.

Because let's be honest: setUpClass is in no way easier to read or understand in your suggestion. Probably a lot harder than before - at least for me. And it doesn't look nice as well ;)

@wolfgangwalther
Copy link
Contributor Author

Maybe I'm misunderstanding what .encode('utf-8') does - but is there actually any difference between the following two?

            (
                u'---\n'
                u'- hétérogénéité\n'
                u'# 19.99 €\n'
                u'- お早う御座います。\n'
                u'# الأَبْجَدِيَّة العَرَبِيَّة\n').encode('utf-8')

and

            (
                b'---\n'
                b'- h\xc3\xa9t\xc3\xa9rog\xc3\xa9n\xc3\xa9it\xc3\xa9\n'
                b'# 19.99 \xe2\x82\xac\n'
                b'- \xe3\x81\x8a\xe6\x97\xa9\xe3\x81\x86\xe5\xbe\xa1\xe5\xba'
                b'\xa7\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99\xe3\x80\x82\n'
                b'# \xd8\xa7\xd9\x84\xd8\xa3\xd9\x8e\xd8\xa8\xd9\x92\xd8\xac'
                b'\xd9\x8e\xd8\xaf\xd9\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9 '
                b'\xd8\xa7\xd9\x84\xd8\xb9\xd9\x8e\xd8\xb1\xd9\x8e\xd8\xa8\xd9'
                b'\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9\n')

I think this is exactly what .encode does, right?

@wolfgangwalther
Copy link
Contributor Author

Your patch also has this: u'non-ascii/éçäγλνπ¥/utf-8'.encode('utf-8') - so a .encode() on the key/path.

I put this into build_temp_workspace and was able to get it working locally. Let's see what the CI builds say, in all the different flavours.

@wolfgangwalther
Copy link
Contributor Author

And - I had to revert the python version check in test_run_non_ascii_file. Setting the locale here seems to be important for the body of the file - not the path. And that seems to be independent of python versions.

@wolfgangwalther
Copy link
Contributor Author

Putting both the version and locale check into def utf8_paths_supported() didn't work out. I changed that again. One of the two tests we are skipping now is skipped only depending on locale availability (test_run_non_ascii_file - we need the locale in any case) and the other test depends on either an available UTF-8 locale OR python >= 3.7 to run right now.

The change to build_temp_workspace seems to have broken the python 2.7 tests now, though.

@adrienverge
Copy link
Owner

I like this new version 👍

Are you sure the "bytes" stuff is actually going to work?

I tried and it seemed to work; maybe passing an already-encoded string to os.path avoids any encoding/decoding at the Python level?

The change to build_temp_workspace seems to have broken the python 2.7 tests now, though.

It's because test_cli.py passes "éçäγλνπ¥" to build_temp_workspace() as bytes instead of an Unicode string. With the following change, it should work:

             # non-ASCII chars
-            'non-ascii/éçäγλνπ¥/utf-8': (
+            u'non-ascii/éçäγλνπ¥/utf-8': (
                 u'---\n'

We're very close! :)

@wolfgangwalther
Copy link
Contributor Author

I tried and it seemed to work; maybe passing an already-encoded string to os.path avoids any encoding/decoding at the Python level?

I think it worked - but not because of all the manually encoded bytes (as they were the content of the file), but only the .encode('utf-8') on the key you had in there as well! That's essentially what I am doing now as well.

Your suggestion about adding u'...' makes a lot of sense. Let's see whether that turns all of them green :)

If it does - I think we have found very good solution!

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have it 🎉
(Travis doesn't report the build status but it's all green, it's a known bug of theirs.)

Thanks a lot @wolfgangwalther for improving tests!

@adrienverge adrienverge merged commit b80997e into adrienverge:master Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants