Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve friendliness with Python 3 #175

Closed
wants to merge 6 commits into from
Closed

Conversation

lexming
Copy link
Contributor

@lexming lexming commented Oct 21, 2020

vsc-install fails to install itself with Python 3 due to the unicode characters in README.md. Files parsed by _read() are decoded as ASCII with strict errors for non-ascii characters. A solution in Python 3 keeping the current general encoding is to explicitly open the file as UTF-8 and then decode to ASCII removing those characters. The fix in 3036884 ensures that everything will work as usual in Python 2.

On the other hand, python-stripped-env is hardcoded to /usr/bin/python, which breaks any script exclusively targeting Python 3 with #!/usr/bin/env python3. The changes in c3d6639 make python-stripped-env work in similar way to #!/usr/bin/env by accepting an argument indicating the python interpreter.

bin/python-stripped-env Outdated Show resolved Hide resolved
bin/python-stripped-env Outdated Show resolved Hide resolved
bin/python-stripped-env Outdated Show resolved Hide resolved
@@ -30,8 +30,28 @@
# -u Treat unset variables as an error when substituting.
set -e -u

# actual python binary that we'll call
PYTHON=/usr/bin/python
# default python interpreter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to understand what this is fixing, but i can't get my had wrapped around it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, nvm. i'm not a big fan of this tbh. i don't like the fact that i can eat arguments, making it a bad wrapper.
better solutions however...
in what kind of an environment do you run into this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not really depend on any specific environment. The problem is that vsc-install is breaking any scripts that can only be executed in Python 3. Scripts with a shebang such as #!/usr/bin/env python3 are replaced with #!/usr/bin/python-stripped-env which is hardcoded to /usr/bin/python. Therefore, if the Python 3 interpreter is called python3, as it usually is, things break.

I propose to solve it by allowing to pass an argument to python-stripped-env defining the target interpreter. In a similar way to how /usr/bin/env works. This PR does not change the default behaviour of python-stripped-env.

Copy link
Contributor Author

@lexming lexming Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7f64734 tightens the names considered as possible python interpreters to reduce the chances of eating arguments by mistake. Now it only accepts the basic python, python2, python3 and those followed by any dot version.

@@ -263,12 +263,29 @@ def _fvs(msg=None):


def _read(source, read_lines=False):
"""read a file, either in full or as a list (read_lines=True)"""
with open(source) as file_handle:
"""read a file, either in full or as a list (read_lines=True) and decode UTF-8 files into ASCII"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you don't know that the UnidecodeError below comes from UTF-8 encoding issues, i think you should give open(fname, encoding="ascii", errors="surrogateescape") a try and see if it solves your issues (or encoding=latin-1).

Copy link
Contributor Author

@lexming lexming Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there is no other way that a UnicodeDecodeError can be triggered if it is not by the presence of unicode characters. So the ascii or latin9 encodings won't fix the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but can you properly read utf-16 text with a utf-8 encoding? both opens abvoe will handle utf8 without errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it might be safer to directly decode the file in ASCII and handle the errors. Using surrogateescape works to read the contents of the file

open(source, encoding='ascii', errors='surrogateescape') as file_handle:
    txt = file_read(file_handle, read_lines)

but it generates new errors down the line. There is a write operation in UTF-8 which does not allow surrogates

INFO: writing lib/vsc_install.egg-info/PKG-INFO
Traceback (most recent call last):
  File "setup.py", line 1675, in <module>
    action_target(PACKAGE)
  File "setup.py", line 1642, in action_target
    _fvs('action_target function')().action_target(package, *args, **kwargs)
  File "setup.py", line 1631, in action_target
    setupfn(**x)
  File "/usr/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
[...]
  File "/usr/lib/python3.6/site-packages/setuptools/dist.py", line 92, in write_pkg_file
    file.write('Description: %s\n' % long_desc)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 10383-10385: surrogates not allowed
error: Bad exit status from /var/tmp/rpm-tmp.zPj48a (%install)

I could hunt all those new errors, but I think that we can just use backslashreplace. This works across the board and the result is similar to surrogateescape. Done in 22be9bc

@@ -662,7 +679,10 @@ def make_release_tree(self, base_dir, files):
if pyshebang_reg.search(first_line):
log.info("going to adapt shebang for script %s" % fn)
dest, code = self._recopy(base_dir, fn)
code = pyshebang_reg.sub(SHEBANG_STRIPPED_ENV_PYTHON, code)
pyshebang_line = first_line.split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing the full line with SHEBANG_STRIPPED_ENV_PYTHON, this only replaces the call in the shebang (the first element) with SHEBANG_STRIPPED_ENV_PYTHON. Hence, any arguments in the shebang are preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you fix the pyshebang_reg so the .sub only replaces that part?

Copy link
Contributor Author

@lexming lexming Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require changing SHEBANG_ENV_PYTHON to #!/usr/bin/env, is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, create a new global

@lexming
Copy link
Contributor Author

lexming commented Oct 21, 2020

@stdweird For clarification, below are examples of the behaviour of the updated python-stripped-env in a system where the available python interpreters are python, python2 and python3:

$ ./bin/python-stripped-env 
debug: Python interpreter: /usr/bin/python
Python 2.7.5 (default, Apr  2 2020, 13:16:51) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
$ ./bin/python-stripped-env python
debug: Python interpreter: /usr/bin/python
Python 2.7.5 (default, Apr  2 2020, 13:16:51) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
$ ./bin/python-stripped-env python2
debug: Python interpreter:  /usr/bin/python2
Python 2.7.5 (default, Apr  2 2020, 13:16:51) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
$ ./bin/python-stripped-env python3
debug: Python interpreter:  /usr/bin/python3
Python 3.6.8 (default, Apr  2 2020, 13:34:55) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Any unrelated arguments will be passed over as expected

$ ./bin/python-stripped-env potato
/usr/bin/python: can't open file 'potato': [Errno 2] No such file or directory
$ ./bin/python-stripped-env python potato
/usr/bin/python: can't open file 'potato': [Errno 2] No such file or directory
$ ./bin/python-stripped-env python2 potato
/usr/bin/python2: can't open file 'potato': [Errno 2] No such file or directory
$ ./bin/python-stripped-env python3 potato
/usr/bin/python3: can't open file 'potato': [Errno 2] No such file or directory

txt = file_read(file_handle, read_lines)
except UnicodeDecodeError:
# file contains unicode characters, try again backslashing them
with open(source, encoding='ascii', errors="backslashreplace") as file_handle:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not only using this open instead of the try/except? also, is this py2 compat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 2 open() does not accept encoding and errors

@stdweird
Copy link
Member

@lexming still confused

the first issue (unrelated from second i presume) is what exactly? "vsc-install" fails to install itself, what does that mean? (i'm trying to understand why we have not seen this before)

the second issue, what are you trying to do there? right now, one cannot install vsc-install (and vsc-base) as rpms for both py2 and py3 on the same host. that is something i'm aware of, but also something i was hoping we don't have to fix. in any case, the fix involves renaming the scripts in bin for py3 (eg by adding a 3 suffix), and replacing the /usr/bin/python with /usr/bin/python3.

@lexming
Copy link
Contributor Author

lexming commented Oct 22, 2020

@stdweird
First issue: building the RPM of vsc-install with Python 3 fails, specifically python3 setup.py bdist_rpm. That is caused by the encoding of README.md and fixed by the changes to _read().

Second issue: vsc-install breaks the installation of any scripts requiring python 3. Not vsc-install itself, but other packages installed with it. This is caused by the replacement of shebangs done by vsc-install and fixed by the changes to python-stripped-env and make_release_tree().

@stdweird
Copy link
Member

@lexming how odd. we have build lots of py3 rpms by now, never once saw any issue.

the utf-8 issue for the Readme is also interesting. i just checked, and it is parsed/escaped as is on el8. what python version do you have the issue with, and what locale have you set?
feel free to also fix the text, as it seems totally pointless anyway.

wrt the second issue, the real problem is however that you are packaging py3 code using vsc-install, and vsc-install should generate proper py3 wrapper scripts instead of the curent code (cfr #169, which i never gotten around to, and still to not have to fix 😉 ). the workarounds you propose are not needed at all, vsc-install can, in its current form, only be installed with one python version anyway. your patches do not address that issue (and quite frankly, i don't think they are needed at all).

@lexming
Copy link
Contributor Author

lexming commented Oct 23, 2020

@stdweird the decoding issue happens with Python 3.6.8 in CentOS 7. I think that the text in the README is fine, removing the unicode characters in it would just work around the issue. I could also play with PYTHONIOENCODING but again that is beyond the point. IMO vsc-install should not hope for the host system to have a configuration of its liking and handle such errors instead.

Regarding the second issue, why do you consider my changes to python-stripped-env a workaround? This PR makes python-stripped-env be a better wrapper swappable with #!/usr/bin/env and, additionally, without breaking current behaviour. So I would say (after a very quick look) that this PR pretty much fixes issue #169.

vsc-install can, in its current form, only be installed with one python version anyway

I'm actually installing vsc-install with a single Python version (3.6). Then this is supposed to work. However, the real issue is that vsc-install assumes that /usr/bin/python always exists and that it is the one and only python, which is not always the case as you can see.

@lexming
Copy link
Contributor Author

lexming commented May 17, 2022

Closing given that Python 2 is dead and we can easily workaround this limitation with a venv.

@lexming lexming closed this May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants