Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1722 #1003

Closed
andyDoucette opened this issue Nov 1, 2020 · 9 comments · Fixed by #1004
Closed

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1722 #1003

andyDoucette opened this issue Nov 1, 2020 · 9 comments · Fixed by #1004

Comments

@andyDoucette
Copy link

Description

Installing opeml in an ubuntu:18.04 docker container leads to this error:

Step 4/4 : RUN pip3 install openml
 ---> Running in d9c37780f2d7
Collecting openml
  Downloading https://files.pythonhosted.org/packages/a4/5d/30ce4d1af609ba389d55654e6a7271619253dbbe7006a33bb20c703f0234/openml-0.11.0.tar.gz (110kB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-cqdtdqzh/openml/setup.py", line 20, in <module>
        README = fid.read()
      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1722: ordinal not in range(128)
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-cqdtdqzh/openml/
ERROR: Service 'ba' failed to build: The command '/bin/sh -c pip3 install openml' returned a non-zero code: 1

Steps/Code to Reproduce

Here is the Dockerfile:

FROM ubuntu:18.04

#Make sure that we don't get interactive prompts during installs:
ENV DEBIAN_FRONTEND=noninteractive

#Install system dependencies:
RUN apt-get update && apt-get install -y \
                                            bc \
                                            build-essential \
                                            gawk \
                                            git \
                                            gosu \
                                            htop \
                                            imagemagick \
                                            nano \
                                            python3-opencv \
                                            python3-pip \
                                            software-properties-common \
                                            sudo \
    && rm -rf /var/lib/apt/lists/*

RUN pip3 install openml

Expected Results

A successful install of openml.

Note

Changing the FROM line to ubuntu:20.04 resolves the issue, but I have GPU driver compatibility constraints that require i use 18.04. 18.04 is supposed to be maintained for several more years, so it would be great if openml would install on that too.

@andyDoucette
Copy link
Author

andyDoucette commented Nov 1, 2020

Two things:

  1. This problem never used to happen. The same Dockerfile built just fine 3 months ago. Something changed in the meantime.

  2. The problem goes away if I install python 3.8 in ubuntu 18.04 from a ppa. I hate to do this in a production system because it breaks a lot of other things down-stream, but it's information about what might be causing the issue. Looking at what changed in these versions, I suspect the fix had to do with the implementation of PEP 538. https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-pep538. Is there a way openml can be changed to use the right decoding all the time?

FROM ubuntu:18.04

#Make sure that we don't get interactive prompts during installs:
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa

#Install system dependencies:
RUN apt-get update && apt-get install -y \
                                            bc \
                                            build-essential \
                                            gawk \
                                            git \
                                            gosu \
                                            htop \
                                            imagemagick \
                                            nano \
                                            python3.8 \
                                            python3-opencv \
                                            python3-pip \
                                            software-properties-common \
                                            sudo \
    && rm -rf /var/lib/apt/lists/*

RUN python3.8 -m pip install openml

@andyDoucette
Copy link
Author

andyDoucette commented Nov 1, 2020

It looks like there was a regression sometime between openml==0.11.0 and openml==0.10.2. Switching back to openml==0.10.2 fixes my problem, even with ubuntu 18.04, but others are still likely to run into this so it's probably worth fixing for other people.

@joaquinvanschoren
Copy link
Contributor

Hi Andy,
Sorry for the slow reply. I'm moving this issue to the openml-python tracker. They'll be able to check what's going with the install.

@joaquinvanschoren joaquinvanschoren transferred this issue from openml/OpenML Nov 17, 2020
@andyDoucette
Copy link
Author

thanks.

@PGijsbers
Copy link
Collaborator

According to the stack trace, one would expect it's due to changes in the README file. The changes are:

diff --git "a/openml-python-0.10.2\\README.md" "b/openml-python-0.11.0\\README.md"
index 63e3315..7320856 100644
--- "a/openml-python-0.10.2\\README.md"
+++ "b/openml-python-0.11.0\\README.md"
@@ -1,9 +1,14 @@
-[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
+# OpenML-Python

 A python interface for [OpenML](http://openml.org), an online platform for open science collaboration in machine learning.
 It can be used to download or upload OpenML data such as datasets and machine learning experiment results.
-You can find the documentation on the [openml-python website](https://openml.github.io/openml-python).
-If you wish to contribute to the package, please see our [contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).
+
+## General
+
+* [Documentation](https://openml.github.io/openml-python).
+* [Contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).
+
+[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

 Master branch:

@@ -16,3 +21,22 @@ Development branch:
 [![Build Status](https://travis-ci.org/openml/openml-python.svg?branch=develop)](https://travis-ci.org/openml/openml-python)
 [![Build status](https://ci.appveyor.com/api/projects/status/blna1eip00kdyr25/branch/develop?svg=true)](https://ci.appveyor.com/project/OpenML/openml-python/branch/develop)
 [![Coverage Status](https://coveralls.io/repos/github/openml/openml-python/badge.svg?branch=develop)](https://coveralls.io/github/openml/openml-python?branch=develop)
+
+## Citing OpenML-Python
+
+If you use OpenML-Python in a scientific publication, we would appreciate a reference to the
+following paper:
+
+[Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas M<C3><BC>ller, Joaquin Vanschoren, Frank Hutter<br/>
+**OpenML-Python: an extensible Python API for OpenML**<br/>
+*arXiv:1911.02490 [cs.LG]*](https://arxiv.org/abs/1911.02490)
+
+Bibtex entry:
+```bibtex
+@article{feurer-arxiv19a,
+  author    = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas M<C3><BC>ller and Joaquin Vanschoren and Frank Hutter},
+  title     = {OpenML-Python: an extensible Python API for OpenML},
+  journal   = {arXiv:1911.02490},
+  year      = {2019},
+}
+```

I suspect the ü (<C3><BC>) in Müller is the culprit, as it is not included in ASCII but is in UTF-8. I don't know if just adding encoding='utf-8' as an argument to open would be sufficient. I'll look into it.

@PGijsbers
Copy link
Collaborator

I patched it and with the new docker file (replace the branchname with develop after the PR is merged):

FROM ubuntu:18.04

#Make sure that we don't get interactive prompts during installs:
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa

#Install system dependencies:
RUN apt-get update && apt-get install -y \
                                            bc \
                                            build-essential \
                                            gawk \
                                            git \
                                            gosu \
                                            htop \
                                            imagemagick \
                                            nano \
                                            python3.8 \
                                            python3-opencv \
                                            python3-pip \
                                            software-properties-common \
                                            sudo \
    && rm -rf /var/lib/apt/lists/*

RUN python3.8 -m pip install -e git://github.com/openml/openml-python.git@PGijsbers-patch-1#egg=openml
RUN python3.8 -m pip install -U numpy

it installs successfully. Note that I had to also install numpy through the regular repository because otherwise it raised some error.

@andyDoucette
Copy link
Author

andyDoucette commented Nov 18, 2020 via email

@PGijsbers
Copy link
Collaborator

And thank you for bringing it to our attention with the needed documentation to reproduce the issue 👍 especially for platform specific errors that makes a big difference!

@andyDoucette
Copy link
Author

andyDoucette commented Nov 19, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants