-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add build environment without UTF-8 locales to travis-ci #289
Add build environment without UTF-8 locales to travis-ci #289
Conversation
Preventing regressions like #285
Thanks, that's a good idea. It shows interesting problems on Python >= 3, <= 3.6:
|
Yes, I am looking at that right now as well. Not sure why it says line 102, but I would guess this is more about 89-94 - what do you think? |
I agree, line 102 looks OK. I guess line counting is altered by the right-to-left characters ( |
Yes, that makes sense. I assume we have to adjust |
So, I constructed those lists in |
Still not clear, though, why this is just a problem on some python versions and not on all of them, when no UTF-8 locale is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! I would rather be more precise on what Python versions are expected to fail (and be make sure everything works with Python ⩾ 3.7, even with a missing /usr/lib/locale/C.utf8
), and keep tests simple and easy to read, by simply skipping them in the rare case when C.UTF-8
is not available.
Something like:
# Check system's UTF-8 availability, because without it using UTF-8 paths
# like 'éçäγλνπ¥' will break on Python ⩽ 3.6
def utf8_paths_supported():
if sys.version_info >= (3, 7):
return True
try:
locale.setlocale(locale.LC_ALL, 'C.UTF-8')
locale.setlocale(locale.LC_ALL, (None, None))
return True
except locale.Error:
return False
@unittest.skipIf(not utf8_paths_supported(), 'UTF-8 paths not supported')
class CommandLineTestCase(unittest.TestCase):
What do you think?
Co-authored-by: Adrien Vergé <adrienverge@gmail.com>
Skipping tests instead would be fine with me - if it was only the tests that are using UTF-8 paths. @unittest.skipIf(not utf8_paths_supported(), 'UTF-8 paths not supported')
class CommandLineTestCase(unittest.TestCase): Doesn't that basically skip the whole So I guess I would put the But thats definitely a lot cleaner than the current "solution" of patching all those arrays. |
tests/test_cli.py
Outdated
|
||
if utf8_paths_supported(): | ||
# non-ASCII chars | ||
workspace_def['non-ascii/éçäγλνπ¥/utf-8'] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still kept this part in, because this way we only need to disable two out of ~25 tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would kind of defeat the purpose of this PR in the first place
Hmm, sure, you're right...
But I'm still uncomfortable with complexifying setUpClass()
in two conditional parts with workspace_def = ...
.
I have another idea to execute the same code, whatever Python version, whatever locales installed: using bytes directly. What about this?
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -86,12 +86,24 @@ class CommandLineTestCase(unittest.TestCase):
'no-yaml.json': '---\n'
'key: value\n',
# non-ASCII chars
- 'non-ascii/éçäγλνπ¥/utf-8': (
- u'---\n'
- u'- hétérogénéité\n'
- u'# 19.99 €\n'
- u'- お早う御座います。\n'
- u'# الأَبْجَدِيَّة العَرَبِيَّة\n').encode('utf-8'),
+ # The following bytes work even on systems where C-UTF-8 is not
+ # available. They are the representation of:
+ # 'non-ascii/éçäγλνπ¥/utf-8':
+ # u'---\n'
+ # u'- hétérogénéité\n'
+ # u'# 19.99 €\n'
+ # u'- お早う御座います。\n'
+ # u'# الأَبْجَدِيَّة العَرَبِيَّة\n'
+ u'non-ascii/éçäγλνπ¥/utf-8'.encode('utf-8'): (
+ b'---\n'
+ b'- h\xc3\xa9t\xc3\xa9rog\xc3\xa9n\xc3\xa9it\xc3\xa9\n'
+ b'# 19.99 \xe2\x82\xac\n'
+ b'- \xe3\x81\x8a\xe6\x97\xa9\xe3\x81\x86\xe5\xbe\xa1\xe5\xba'
+ b'\xa7\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99\xe3\x80\x82\n'
+ b'# \xd8\xa7\xd9\x84\xd8\xa3\xd9\x8e\xd8\xa8\xd9\x92\xd8\xac'
+ b'\xd9\x8e\xd8\xaf\xd9\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9 '
+ b'\xd8\xa7\xd9\x84\xd8\xb9\xd9\x8e\xd8\xb1\xd9\x8e\xd8\xa8\xd9'
+ b'\x90\xd9\x8a\xd9\x8e\xd9\x91\xd8\xa9\n'),
# dos line endings yaml
'dos.yml': '---\r\n'
'dos: true',
--- a/tests/common.py
+++ b/tests/common.py
@@ -57,7 +57,8 @@ def build_temp_workspace(files):
tempdir = tempfile.mkdtemp(prefix='yamllint-tests-')
for path, content in files.items():
- path = os.path.join(tempdir, path)
+ path = path if isinstance(path, bytes) else path.encode()
+ path = os.path.join(tempdir.encode(), path)
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))
(Then, we could still skip problematic tests, like your PR currently does.)
Are you sure the "bytes" stuff is actually going to work? The original error message didn't complain about the Traceback (most recent call last):
File "/home/travis/build/adrienverge/yamllint/tests/test_cli.py", line 102, in setUpClass
'en.yaml': '---\n'
File "/home/travis/build/adrienverge/yamllint/tests/common.py", line 61, in build_temp_workspace
if not os.path.exists(os.path.dirname(path)):
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/genericpath.py", line 19, in exists
os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-46: ordinal not in range(128) If I'm not mistaken we are passing a UTF-8 encoded string (which is perfectly supported by python, I'm pretty sure) to Now, if that patch indeed works - can't we just convert the Because let's be honest: |
Maybe I'm misunderstanding what (
u'---\n'
u'- hétérogénéité\n'
u'# 19.99 €\n'
u'- お早う御座います。\n'
u'# الأَبْجَدِيَّة العَرَبِيَّة\n').encode('utf-8') and
I think this is exactly what |
Your patch also has this: I put this into |
And - I had to revert the python version check in |
Putting both the version and locale check into The change to |
I like this new version 👍
I tried and it seemed to work; maybe passing an already-encoded string to
It's because # non-ASCII chars
- 'non-ascii/éçäγλνπ¥/utf-8': (
+ u'non-ascii/éçäγλνπ¥/utf-8': (
u'---\n' We're very close! :) |
I think it worked - but not because of all the manually encoded bytes (as they were the content of the file), but only the Your suggestion about adding If it does - I think we have found very good solution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have it 🎉
(Travis doesn't report the build status but it's all green, it's a known bug of theirs.)
Thanks a lot @wolfgangwalther for improving tests!
Preventing regressions like #285