Skip to content

str(zipfile.Path()) not working when root attribute is io.BytesIO #100609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
buhtz opened this issue Dec 29, 2022 · 12 comments
Open

str(zipfile.Path()) not working when root attribute is io.BytesIO #100609

buhtz opened this issue Dec 29, 2022 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@buhtz
Copy link

buhtz commented Dec 29, 2022

Using zipfile.Path with a ZIP-file "in memory" as io.BytesIO instance. In that case attributes like .filename() or .__str__() do not work because they try to instanciate a pathlib.Path() with a io.BytesIO object.

I can confirm this with Python 3.11 (run in Docker), Python 3.10.2 (Windows) and Python 3.9.2 (Debian stable). See output below for detailed version infos.

Code to reproduce

#!/usr/bin/env python3
import sys
import io
import pathlib
import zipfile

print('{} {}\n'.format(sys.platform, sys.version))

zip_bytes = io.BytesIO()

# create a zipfile in memory
with zipfile.ZipFile(zip_bytes, 'w') as zf:
    zf.writestr('entry.file', 'content')

# ZIP-Path object from in-memory-ZIP
my_zip_path = zipfile.Path(zip_bytes, 'entry.file')
print('Exists {}'.format(my_zip_path.exists()))

# here comes the Exception
print(my_zip_path)

Output: Python 3.11 on Linux/Docker

Digest: sha256:250990a809a15bb6a3e307fec72dead200c882c940523fb1694baa78eb0e47c6
Status: Downloaded newer image for python:3.11
Python 3.11.1 (main, Dec 21 2022, 18:32:57) [GCC 10.2.1 20210110] on linux
......
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/zipfile.py", line 2421, in filename
    return pathlib.Path(self.root.filename).joinpath(self.at)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 871, in __new__
    self = cls._from_parts(args)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 509, in _from_parts
    drv, root, parts = self._parse_args(args)
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 493, in _parse_args
    a = os.fspath(a)
        ^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType

Output: Python 3.10 on Windows

win32 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)]

Exists True
Traceback (most recent call last):
  File "C:\Users\buhtzch\ownCloud\my.work\buhtzology\zippathbug.py", line 28, in <module>
    print(my_zip_path)
  File "C:\Program Files\Python310\lib\zipfile.py", line 2380, in __str__
    return posixpath.join(self.root.filename, self.at)
  File "C:\Program Files\Python310\lib\posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Output: Python 3.9.2 on Debian stable (11)

linux 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110]

Exists True
Traceback (most recent call last):
  File "/home/user/ownCloud/my.work/buhtzology/zippathbug.py", line 28, in <module>
    print(my_zip_path)
  File "/usr/lib/python3.9/zipfile.py", line 2347, in __str__
    return posixpath.join(self.root.filename, self.at)
  File "/usr/lib/python3.9/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
@buhtz buhtz added the type-bug An unexpected behavior, bug, or error label Dec 29, 2022
@barneygale
Copy link
Contributor

This is mentioned in the zipfile.Path docstring (though the exception type is wrong):

At the root, ``name``, ``filename``, and ``parent``
resolve to the zipfile. Note these attributes are not
valid and will raise a ``ValueError`` if the zipfile
has no filename.
>>> root.name
'abcde.zip'
>>> str(root.filename).replace(os.sep, posixpath.sep)
'mem/abcde.zip'
>>> str(root.parent)
'mem'

This feature feels fine within the third-party zipp package -- where zipfile.Path originates -- but is perhaps overwrought within a CPython context. IMO it would suffice to make the original ZipFile object available from path objects via an attribute like my_zip_path.zipfile and leave it there.

We may have an opportunity to revise this when we introduce a fully pathlib-compatible implementation of ZipPath. More here: https://discuss.python.org/t/make-pathlib-extensible/3428

@jaraco
Copy link
Member

jaraco commented Jul 14, 2023

but is perhaps overwrought within a CPython context

I'm not sure I understand the distinction. zipp means to provide the same behavior (just earlier, to older Pythons), so the two implementations should be kept in sync.

IMO it would suffice to make the original ZipFile available...

And it does currently through .root, which has its own issues (the name conflicts with the meaning of pathlib.Path.root).

I don't like the implementation of Path.filename. I'm tempted to deprecate it altogether. __str__ provides something similar but different. .name, .parent and newly .suffix, .suffixes, and .stem all are ill-defined when the zipfile has no filename, so I believe an exception is appropriate in those cases.

But you're right, it's not great to not have a good __str__ representation if the zipfile has no name. I'll work on it.

@barneygale
Copy link
Contributor

Sorry for not being clear. It would suffice to make the original ZipFile available via an attribute. The behaviour of filename, name, and parents at the root is slightly too surprising for a standard library module IMO.

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 24, 2023
@jaraco
Copy link
Member

jaraco commented Mar 21, 2024

In jaraco/zipp#107, I determined that it's not going to be straightforward to support these attributes, and the best recommendation is for users to provide a filename on the ZipFile object.

At this point, I'm thinking the best option might be to document these limitations and recommendations.

@jaraco jaraco marked this as a duplicate of #130120 Feb 27, 2025
@jaraco
Copy link
Member

jaraco commented Feb 27, 2025

In #130120 and the linked PR, @yngvem is exploring this issue. As you can see above, I'd previously concluded:

Users affected by this issue should supply a filename for their zipfile, even if it's just "".

And

the best option might be to document these limitations and recommendations.

@yngvem - what do you think of the recommendation for the caller to set the zipfile .filename? Would that work for your use-case?

What do you think about the following approach:

  1. Document that unnamed zip files will raise TypeErrors when rendered using str and that callers should pass zipfiles with string filenames if they expect for str to work on the Path.
  2. Update the tests to capture this expectation (that str raises a TypeError when the zipfile has no filename and uses the filename even on a memory buffered zipfile).
  3. Maybe also capture the expected repr while we're at it.

@yngvem
Copy link
Contributor

yngvem commented Mar 28, 2025

@jaraco, I think that approach seems very reasonable! Though, maybe it would be nice if the TypeError is raised explicitly in Path.__str__, Path.filename and Path._base so it can contain short text explaining why filename is None?

I would be happy to provide all those changes :)

@barneygale
Copy link
Contributor

barneygale commented Mar 28, 2025

Are there other parts of the stdlib where __str__() raises? That in particular seems weird to me and probably worth fixing somehow. I'm less bothered by filename, name and parent - I think documenting the workaround would be fine there.

@yngvem
Copy link
Contributor

yngvem commented Mar 28, 2025

@barneygale, I also think it seems weird for __str__ to fail, but there are no obvious string representation for unnamed zipfiles (see #130381).

I updated #130381 which makes __str__ work (but parent and name still raises errors) based the review comments. I'll make a PR where it has a better exception message as well so you can chose whichever implementation you prefer :)

@barneygale
Copy link
Contributor

barneygale commented Mar 28, 2025

I like the idea of using :memory: personally!

:fileobj: might be another option, given the file object might not be backed by memory :)

@barneygale
Copy link
Contributor

A similar-ish case in pathlib comes to mind - pathlib.Path.is_mount() used to unconditionally raise NotImplementedError on Windows. In theory someone could have been relying on that behaviour, but we added Windows support in 3.12 nonetheless.

@barneygale barneygale changed the title zipfile.Path.filename not working when root attribute is io.BytesIO str(zipfile.Path()) not working when root attribute is io.BytesIO Mar 28, 2025
@barneygale
Copy link
Contributor

I've logged this upstream issue: jaraco/zipp#134

@barneygale
Copy link
Contributor

And an upstream PR: jaraco/zipp#135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

5 participants