Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmpdir.join("foo").write(...) doesn't work as expected. #605

Closed
pytestbot opened this issue Oct 1, 2014 · 19 comments
Closed

tmpdir.join("foo").write(...) doesn't work as expected. #605

pytestbot opened this issue Oct 1, 2014 · 19 comments
Labels
type: bug problem that needs to be addressed

Comments

@pytestbot
Copy link
Contributor

Originally reported by: Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn)


All of the following testcase

#!python

# -*- coding: utf-8 -*-

def test_1(tmpdir):
    tmpdir.join('foo').write(u'æ')


def test_2(tmpdir):
    tmpdir.join('foo').write(u'æ'.encode('u8'))


def test_3(tmpdir):
    tmpdir.join('foo').write(u'æ'.encode('l1'))

fails with the following errors:

#!python

(dev) w:\>py.test test_tmpdir.py
============================= test session starts =============================
platform win32 -- Python 2.7.8 -- py-1.4.25 -- pytest-2.6.3
plugins: cov, xdist
collected 3 items

test_tmpdir.py FFF

================================== FAILURES ===================================
___________________________________ test_1 ____________________________________

tmpdir = local('c:\\tmp\\pytest-20\\test_10')

    def test_1(tmpdir):
>       tmpdir.join('foo').write(u'æ')

test_tmpdir.py:4:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = local('c:\\tmp\\pytest-20\\test_10\\foo'), data = 'æ', mode = 'w'
ensure = False

    def write(self, data, mode='w', ensure=False):
        """ write data into path.   If ensure is True create
            missing parent directories.
            """
        if ensure:
            self.dirpath().ensure(dir=1)
        if 'b' in mode:
            if not py.builtin._isbytes(data):
                raise ValueError("can only process bytes")
        else:
            if not py.builtin._istext(data):
                if not py.builtin._isbytes(data):
                    data = str(data)
                else:
                    data = py.builtin._totext(data, sys.getdefaultencoding())
        f = self.open(mode)
        try:
>           f.write(data)
E           UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 0: ordinal not in range(128)

dev\lib\site-packages\py\_path\local.py:476: UnicodeEncodeError
___________________________________ test_2 ____________________________________

tmpdir = local('c:\\tmp\\pytest-20\\test_20')

    def test_2(tmpdir):
>       tmpdir.join('foo').write(u'æ'.encode('u8'))

test_tmpdir.py:8:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = local('c:\\tmp\\pytest-20\\test_20\\foo'), data = '\xc3\xa6', mode = 'w'
ensure = False

    def write(self, data, mode='w', ensure=False):
        """ write data into path.   If ensure is True create
            missing parent directories.
            """
        if ensure:
            self.dirpath().ensure(dir=1)
        if 'b' in mode:
            if not py.builtin._isbytes(data):
                raise ValueError("can only process bytes")
        else:
            if not py.builtin._istext(data):
                if not py.builtin._isbytes(data):
                    data = str(data)
                else:
>                   data = py.builtin._totext(data, sys.getdefaultencoding())
E                   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

dev\lib\site-packages\py\_path\local.py:473: UnicodeDecodeError
___________________________________ test_3 ____________________________________

tmpdir = local('c:\\tmp\\pytest-20\\test_30')

    def test_3(tmpdir):
>       tmpdir.join('foo').write(u'æ'.encode('l1'))

test_tmpdir.py:12:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = local('c:\\tmp\\pytest-20\\test_30\\foo'), data = '\xe6', mode = 'w'
ensure = False

    def write(self, data, mode='w', ensure=False):
        """ write data into path.   If ensure is True create
            missing parent directories.
            """
        if ensure:
            self.dirpath().ensure(dir=1)
        if 'b' in mode:
            if not py.builtin._isbytes(data):
                raise ValueError("can only process bytes")
        else:
            if not py.builtin._istext(data):
                if not py.builtin._isbytes(data):
                    data = str(data)
                else:
>                   data = py.builtin._totext(data, sys.getdefaultencoding())
E                   UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

dev\lib\site-packages\py\_path\local.py:473: UnicodeDecodeError
========================== 3 failed in 0.19 seconds ===========================

(dev) w:\>

@pytestbot
Copy link
Contributor Author

Original comment by Alexander Dudko (BitBucket: oktopuz, GitHub: oktopuz):


If you call open() in python 2.7, the default 'ascii' codec is used.

According to https://docs.python.org/2/howto/unicode.html:

  1. If the code point is < 128, each byte is the same as the value of the code point.
  2. If the code point is 128 or greater, the Unicode string can’t be represented in this encoding (Python raises a UnicodeEncodeError exception in this case)

The code point of u'æ' is above 127.

In other words this is expected error that has nothing to do with pytest.

You might want to check https://docs.python.org/2.7/library/codecs.html?highlight=open#codecs.open to work with UTF-8 encoded files.

@pytestbot
Copy link
Contributor Author

Original comment by Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn):


Yes, the code point of æ is above 127, but neither of

#!python


    u'æ'.encode('u8')
    u'æ'.encode('l1')

are Unicode. They've been encoded, so they're byte strings:

#!python

>>> u'æ'.encode('u8') == b'\xc3\xa6'
True
>>> u'æ'.encode('l1') == b'\xe6'
True

Byte strings containing the full 256 possible byte values, are perfectly writable in Python:

#!python


c:\srv>python
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> open('foo', 'w').write(u'æ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 0: ordinal not in range(128)
>>> open('foo', 'w').write(u'æ'.encode('u8'))
>>> open('foo', 'w').write(u'æ'.encode('l1'))

so the question remains: how do you write a Unicode string to a file in py.test, since the normal method of encoding it to utf-8 or latin-1 does not work?

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


i believe that this problem is already solved in latest py (it's there what was wrong)
also please upgrade pytest to the latest
pip install -U pytest py

@pytestbot
Copy link
Contributor Author

Original comment by Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn):


It looks like this code:

#!python

                    data = str(data)
                else:
                    data = py.builtin._totext(data, sys.getdefaultencoding())
        f = self.open(mode)
        try:
>           f.write(data)
E           UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 0: ordinal not in range(128)

tries to decode the encoded string back to Unicode using the default encoding, and then writes the newly created Unicode string to f letting the system try to convert (ie. encode) it back to a byte-string before outputting it to disk. In general you shouldn't mess with already encoded data...

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


but did you try latest py first?
your console output tells that you don't have the latest

@pytestbot
Copy link
Contributor Author

Original comment by Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn):


I'm now running py 1.4.26 and pytest 2.6.4 (I believe those are the latest..) and getting the exact same errors.

#!python

(dev) go|c:\srv\tmp\ptb> py.test
================================================== test session starts ==================================================
platform win32 -- Python 2.7.3 -- py-1.4.26 -- pytest-2.6.4
plugins: cov, xdist
collected 3 items

test_foo.py FFF

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


platform win32 -- Python 2.7.8 -- py-1.4.25 -- pytest-2.6.3 so this is outdated in your report?

@pytestbot
Copy link
Contributor Author

Original comment by Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn):


@anatoly yes, that was from my work computer, this is from home.

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


code looks wrong i agree
easy workaround it seems is to pass 'b' mode

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


tmpdir.join('foo').write(u'æ'.encode('u8'), mode='wb')

@pytestbot
Copy link
Contributor Author

Original comment by Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn):


I've also verified the problem on linux:

#!python

(dev)go|~/work/ptb$ py.test
========================================================================================= test session starts ==========================================================================================
platform linux2 -- Python 2.7.3 -- py-1.4.26 -- pytest-2.6.4
plugins: cov, xdist
collected 3 items

test_tdir.py FFF

@pytestbot
Copy link
Contributor Author

Original comment by Bjorn Pettersen (BitBucket: thebjorn, GitHub: thebjorn):


Ah, cool tmpdir.join('foo').write(u'æ'.encode('u8'), mode='wb') works :-)

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


the problem is that it detects your string as not a bytes string, which is strange, we need to fix that detection

@pytestbot
Copy link
Contributor Author

Original comment by Anatoly Bubenkov (BitBucket: bubenkoff, GitHub: bubenkoff):


@oktopuz, default encoding depends on the site.py basically, so your statement is not fully true

@pytestbot pytestbot added the type: bug problem that needs to be addressed label Jun 15, 2015
@RonnyPfannschmidt RonnyPfannschmidt modified the milestones: 2.8, 2.8.dev Sep 13, 2015
@nicoddemus
Copy link
Member

I think the question here is if LocalPath.write should receive an encoding parameter which is mandatory when writing unicode strings.

@nicoddemus
Copy link
Member

Seems more like a py issue rather than pytest's... perhaps we should open an issue in there instead and close this one?

(cc'ing people that were originally part of the conversation)
@bubenkoff @thebjorn

@nicoddemus nicoddemus removed this from the 2.8.1 milestone Sep 26, 2015
@thebjorn
Copy link

I think the most useful solution is to make .write() take a byte string (and leave it up to the user to encode data before passing it). With such an implementation you could always write byte-strings (without knowing their encoding). A wrapper that takes a unicode string and an encoding is easy to write on top of a byte-string implementation. If you only have a unicode/encoding implementation it is difficult to write a bytestring implementation without knowing the encoding. (I don't know enough about pytest vs py structure to say where a solution should be implemented).

@RonnyPfannschmidt
Copy link
Member

In the long term wed like to replace the py.path internals with pathlib

@RonnyPfannschmidt
Copy link
Member

closing this one as the upstream issue is pytest-dev/py#107

tony added a commit to cihai/unihan-etl that referenced this issue Apr 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug problem that needs to be addressed
Projects
None yet
Development

No branches or pull requests

5 participants