-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve friendliness with Python 3 #175
Conversation
@@ -30,8 +30,28 @@ | |||
# -u Treat unset variables as an error when substituting. | |||
set -e -u | |||
|
|||
# actual python binary that we'll call | |||
PYTHON=/usr/bin/python | |||
# default python interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm trying to understand what this is fixing, but i can't get my had wrapped around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, nvm. i'm not a big fan of this tbh. i don't like the fact that i can eat arguments, making it a bad wrapper.
better solutions however...
in what kind of an environment do you run into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not really depend on any specific environment. The problem is that vsc-install
is breaking any scripts that can only be executed in Python 3. Scripts with a shebang such as #!/usr/bin/env python3
are replaced with #!/usr/bin/python-stripped-env
which is hardcoded to /usr/bin/python
. Therefore, if the Python 3 interpreter is called python3
, as it usually is, things break.
I propose to solve it by allowing to pass an argument to python-stripped-env
defining the target interpreter. In a similar way to how /usr/bin/env
works. This PR does not change the default behaviour of python-stripped-env
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7f64734 tightens the names considered as possible python interpreters to reduce the chances of eating arguments by mistake. Now it only accepts the basic python
, python2
, python3
and those followed by any dot version.
@@ -263,12 +263,29 @@ def _fvs(msg=None): | |||
|
|||
|
|||
def _read(source, read_lines=False): | |||
"""read a file, either in full or as a list (read_lines=True)""" | |||
with open(source) as file_handle: | |||
"""read a file, either in full or as a list (read_lines=True) and decode UTF-8 files into ASCII""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you don't know that the UnidecodeError
below comes from UTF-8 encoding issues, i think you should give open(fname, encoding="ascii", errors="surrogateescape")
a try and see if it solves your issues (or encoding=latin-1
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there is no other way that a UnicodeDecodeError
can be triggered if it is not by the presence of unicode characters. So the ascii
or latin9
encodings won't fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can you properly read utf-16
text with a utf-8
encoding? both opens abvoe will handle utf8 without errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it might be safer to directly decode the file in ASCII and handle the errors. Using surrogateescape
works to read the contents of the file
open(source, encoding='ascii', errors='surrogateescape') as file_handle:
txt = file_read(file_handle, read_lines)
but it generates new errors down the line. There is a write operation in UTF-8 which does not allow surrogates
INFO: writing lib/vsc_install.egg-info/PKG-INFO
Traceback (most recent call last):
File "setup.py", line 1675, in <module>
action_target(PACKAGE)
File "setup.py", line 1642, in action_target
_fvs('action_target function')().action_target(package, *args, **kwargs)
File "setup.py", line 1631, in action_target
setupfn(**x)
File "/usr/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
return distutils.core.setup(**attrs)
[...]
File "/usr/lib/python3.6/site-packages/setuptools/dist.py", line 92, in write_pkg_file
file.write('Description: %s\n' % long_desc)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 10383-10385: surrogates not allowed
error: Bad exit status from /var/tmp/rpm-tmp.zPj48a (%install)
I could hunt all those new errors, but I think that we can just use backslashreplace
. This works across the board and the result is similar to surrogateescape
. Done in 22be9bc
@@ -662,7 +679,10 @@ def make_release_tree(self, base_dir, files): | |||
if pyshebang_reg.search(first_line): | |||
log.info("going to adapt shebang for script %s" % fn) | |||
dest, code = self._recopy(base_dir, fn) | |||
code = pyshebang_reg.sub(SHEBANG_STRIPPED_ENV_PYTHON, code) | |||
pyshebang_line = first_line.split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of replacing the full line with SHEBANG_STRIPPED_ENV_PYTHON
, this only replaces the call in the shebang (the first element) with SHEBANG_STRIPPED_ENV_PYTHON
. Hence, any arguments in the shebang are preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you fix the pyshebang_reg
so the .sub
only replaces that part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require changing SHEBANG_ENV_PYTHON
to #!/usr/bin/env
, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively, create a new global
@stdweird For clarification, below are examples of the behaviour of the updated
Any unrelated arguments will be passed over as expected
|
txt = file_read(file_handle, read_lines) | ||
except UnicodeDecodeError: | ||
# file contains unicode characters, try again backslashing them | ||
with open(source, encoding='ascii', errors="backslashreplace") as file_handle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not only using this open instead of the try/except? also, is this py2 compat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python 2 open()
does not accept encoding
and errors
@lexming still confused the first issue (unrelated from second i presume) is what exactly? "vsc-install" fails to install itself, what does that mean? (i'm trying to understand why we have not seen this before) the second issue, what are you trying to do there? right now, one cannot install vsc-install (and vsc-base) as rpms for both py2 and py3 on the same host. that is something i'm aware of, but also something i was hoping we don't have to fix. in any case, the fix involves renaming the scripts in bin for py3 (eg by adding a |
@stdweird Second issue: |
@lexming how odd. we have build lots of py3 rpms by now, never once saw any issue. the utf-8 issue for the Readme is also interesting. i just checked, and it is parsed/escaped as is on el8. what python version do you have the issue with, and what locale have you set? wrt the second issue, the real problem is however that you are packaging py3 code using vsc-install, and vsc-install should generate proper py3 wrapper scripts instead of the curent code (cfr #169, which i never gotten around to, and still to not have to fix 😉 ). the workarounds you propose are not needed at all, vsc-install can, in its current form, only be installed with one python version anyway. your patches do not address that issue (and quite frankly, i don't think they are needed at all). |
@stdweird the decoding issue happens with Python 3.6.8 in CentOS 7. I think that the text in the README is fine, removing the unicode characters in it would just work around the issue. I could also play with Regarding the second issue, why do you consider my changes to
I'm actually installing |
Closing given that Python 2 is dead and we can easily workaround this limitation with a venv. |
vsc-install
fails to install itself with Python 3 due to the unicode characters in README.md. Files parsed by_read()
are decoded as ASCII with strict errors for non-ascii characters. A solution in Python 3 keeping the current general encoding is to explicitly open the file as UTF-8 and then decode to ASCII removing those characters. The fix in 3036884 ensures that everything will work as usual in Python 2.On the other hand,
python-stripped-env
is hardcoded to/usr/bin/python
, which breaks any script exclusively targeting Python 3 with#!/usr/bin/env python3
. The changes in c3d6639 makepython-stripped-env
work in similar way to#!/usr/bin/env
by accepting an argument indicating the python interpreter.