Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve friendliness with Python 3 #175

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions bin/python-stripped-env
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,21 @@
# -u Treat unset variables as an error when substituting.
set -e -u

# actual python binary that we'll call
# default python interpreter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to understand what this is fixing, but i can't get my had wrapped around it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, nvm. i'm not a big fan of this tbh. i don't like the fact that i can eat arguments, making it a bad wrapper.
better solutions however...
in what kind of an environment do you run into this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not really depend on any specific environment. The problem is that vsc-install is breaking any scripts that can only be executed in Python 3. Scripts with a shebang such as #!/usr/bin/env python3 are replaced with #!/usr/bin/python-stripped-env which is hardcoded to /usr/bin/python. Therefore, if the Python 3 interpreter is called python3, as it usually is, things break.

I propose to solve it by allowing to pass an argument to python-stripped-env defining the target interpreter. In a similar way to how /usr/bin/env works. This PR does not change the default behaviour of python-stripped-env.

Copy link
Contributor Author

@lexming lexming Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7f64734 tightens the names considered as possible python interpreters to reduce the chances of eating arguments by mistake. Now it only accepts the basic python, python2, python3 and those followed by any dot version.

PYTHON=/usr/bin/python

# check first argument for any known python interpreter
first_arg=${1:-false}
case "$first_arg" in
python|python[23]|python[23].[0-9]*)
# replace $PYTHON with first argument if it's an available interpreter
if command -v $1 &> /dev/null; then
PYTHON=$(which $1)
fi
shift
;;
esac

# avoid weird issue with grep (for example, colored output)
unset GREP_OPTIONS

Expand All @@ -43,7 +55,7 @@ function debug {
fi
}

debug "which python: $(command -v python)"
debug "Python interpreter: $(command -v $PYTHON)"

# unset all $LD_* environment variables
for env_var in $(env | grep -E '^LD_' | cut -f1 -d'='); do
Expand Down
23 changes: 19 additions & 4 deletions lib/vsc/install/shared_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def _log(self, level, msg, args):

RELOAD_VSC_MODS = False

VERSION = '0.17.1'
VERSION = '0.17.2'

log.info('This is (based on) vsc.install.shared_setup %s' % VERSION)
log.info('(using setuptools version %s located at %s)' % (setuptools.__version__, setuptools.__file__))
Expand Down Expand Up @@ -263,12 +263,24 @@ def _fvs(msg=None):


def _read(source, read_lines=False):
"""read a file, either in full or as a list (read_lines=True)"""
with open(source) as file_handle:
"""read a file, either in full or as a list (read_lines=True) and decode UTF-8 files into ASCII"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you don't know that the UnidecodeError below comes from UTF-8 encoding issues, i think you should give open(fname, encoding="ascii", errors="surrogateescape") a try and see if it solves your issues (or encoding=latin-1).

Copy link
Contributor Author

@lexming lexming Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there is no other way that a UnicodeDecodeError can be triggered if it is not by the presence of unicode characters. So the ascii or latin9 encodings won't fix the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but can you properly read utf-16 text with a utf-8 encoding? both opens abvoe will handle utf8 without errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it might be safer to directly decode the file in ASCII and handle the errors. Using surrogateescape works to read the contents of the file

open(source, encoding='ascii', errors='surrogateescape') as file_handle:
    txt = file_read(file_handle, read_lines)

but it generates new errors down the line. There is a write operation in UTF-8 which does not allow surrogates

INFO: writing lib/vsc_install.egg-info/PKG-INFO
Traceback (most recent call last):
  File "setup.py", line 1675, in <module>
    action_target(PACKAGE)
  File "setup.py", line 1642, in action_target
    _fvs('action_target function')().action_target(package, *args, **kwargs)
  File "setup.py", line 1631, in action_target
    setupfn(**x)
  File "/usr/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
[...]
  File "/usr/lib/python3.6/site-packages/setuptools/dist.py", line 92, in write_pkg_file
    file.write('Description: %s\n' % long_desc)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 10383-10385: surrogates not allowed
error: Bad exit status from /var/tmp/rpm-tmp.zPj48a (%install)

I could hunt all those new errors, but I think that we can just use backslashreplace. This works across the board and the result is similar to surrogateescape. Done in 22be9bc


def file_read(file_handle, read_lines):
"""raw read action on file handle, either in full or as a list (read_lines=True)"""
if read_lines:
txt = file_handle.readlines()
else:
txt = file_handle.read()
return txt

try:
with open(source) as file_handle:
txt = file_read(file_handle, read_lines)
except UnicodeDecodeError:
# file contains unicode characters, try again backslashing them
with open(source, encoding='ascii', errors="backslashreplace") as file_handle:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not only using this open instead of the try/except? also, is this py2 compat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 2 open() does not accept encoding and errors

txt = file_read(file_handle, read_lines)

return txt


Expand Down Expand Up @@ -662,7 +674,10 @@ def make_release_tree(self, base_dir, files):
if pyshebang_reg.search(first_line):
log.info("going to adapt shebang for script %s" % fn)
dest, code = self._recopy(base_dir, fn)
code = pyshebang_reg.sub(SHEBANG_STRIPPED_ENV_PYTHON, code)
pyshebang_line = first_line.split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing the full line with SHEBANG_STRIPPED_ENV_PYTHON, this only replaces the call in the shebang (the first element) with SHEBANG_STRIPPED_ENV_PYTHON. Hence, any arguments in the shebang are preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you fix the pyshebang_reg so the .sub only replaces that part?

Copy link
Contributor Author

@lexming lexming Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require changing SHEBANG_ENV_PYTHON to #!/usr/bin/env, is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, create a new global

pyshebang_line[0] = SHEBANG_STRIPPED_ENV_PYTHON
pyshebang_line = ' '.join(pyshebang_line)
code = pyshebang_reg.sub(pyshebang_line, code)
self._write(dest, code)
else:
log.info("no scripts to check for shebang")
Expand Down