Skip to content

Possible regression in MultiplexedPath.joinpath in 3.12 #106614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
frenzymadness opened this issue Jul 11, 2023 · 6 comments
Closed

Possible regression in MultiplexedPath.joinpath in 3.12 #106614

frenzymadness opened this issue Jul 11, 2023 · 6 comments
Assignees
Labels
3.12 only security fixes 3.13 bugs and security fixes topic-importlib type-bug An unexpected behavior, bug, or error

Comments

@frenzymadness
Copy link
Contributor

Bug report

In Python 3.11, it's possible to use MultiplexedPath.joinpath("") (with an empty argument) and it returns the first path stored in MultiplexedPath._paths, for example:

>>> from importlib.resources import files
>>> files("jupyterlab_server.test_data")
MultiplexedPath('/usr/lib/python3.11/site-packages/jupyterlab_server/test_data')
>>> files("jupyterlab_server.test_data").joinpath("")
PosixPath('/usr/lib/python3.11/site-packages/jupyterlab_server/test_data')

But in Python 3.12, this doesn't work anymore:

>>> from importlib.resources import files
>>> files("jupyterlab_server.test_data")
MultiplexedPath('/usr/lib/python3.12/site-packages/jupyterlab_server/test_data')
>>> files("jupyterlab_server.test_data").joinpath(".")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.12/importlib/resources/readers.py", line 92, in joinpath
    return super().joinpath(*descendants)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/importlib/resources/abc.py", line 117, in joinpath
    target = next(names)
             ^^^^^^^^^^^
StopIteration

The implementation has been changed in: cea910e#diff-2e741d925220d74a9cc04cda1314d3649d9d189c0efc7db18e5387a51225b61c

Your environment

Python 3.12 beta 3 installed from Fedora RPM, Fedora 38, x86_64

@frenzymadness frenzymadness added the type-bug An unexpected behavior, bug, or error label Jul 11, 2023
@AlexWaygood AlexWaygood added 3.12 only security fixes 3.13 bugs and security fixes topic-importlib labels Jul 11, 2023
@sunmy2019
Copy link
Member

One question. The demo you gave is joinpath(".") not joinpath(""). Is this intentional?

@jaraco
Copy link
Member

jaraco commented Jul 11, 2023

This use-case wasn't considered. It wasn't intentional to break existing supported use-cases, but this case also wasn't tested and so thus isn't explicitly supported. This change was first introduced in python/importlib_resources#250. The intent was to provide a generic, re-usable implementation of joinpath.

What is meant by joinpath("") or joinpath(".")? Does the pathlib implementation of joinpath also implement the expected behavior?

Looking at your example, I'm guessing you're using joinpath() as a means to extract a pathlib.Path from a Traversable. That definitely is not supported and is an anti-pattern, because it violates the assumption that a Traversable is only a subset of a pathlib.Path. If you want a materialized path, you should use as_file, e.g.:

>>> from importlib.resources import files, as_file
>>> as_file(files("jupyterlab_server.test_data"))

Of course, if what you need is a tree of resources, you'll need importlib.resources from Python 3.12 or importlib_resources>=5.9.

@frenzymadness
Copy link
Contributor Author

One question. The demo you gave is joinpath(".") not joinpath(""). Is this intentional?

No, it's not, sorry. I was testing multiple scenarios and copied a different one than intended. The one with "" is the same:

>>> from importlib.resources import files
>>> files("jupyterlab_server.test_data").joinpath("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.12/importlib/resources/readers.py", line 92, in joinpath
    return super().joinpath(*descendants)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/importlib/resources/abc.py", line 117, in joinpath
    target = next(names)
             ^^^^^^^^^^^
StopIteration

This use-case wasn't considered. It wasn't intentional to break existing supported use-cases, but this case also wasn't tested and so thus isn't explicitly supported. This change was first introduced in python/importlib_resources#250. The intent was to provide a generic, re-usable implementation of joinpath.

Thank you for the detailed explanation. The code using Traversable this way isn't mine. I've just needed to fix it to make that compatible with Python 3.12 during mass package rebuild in Fedora.

What is meant by joinpath("") or joinpath(".")? Does the pathlib implementation of joinpath also implement the expected behavior?

I can only guess here because I'm not the original author of the code I needed to fix. But yes, calling .joinpath("") on a Path works fine:

>>> from pathlib import Path
>>> Path("/usr/lib/python3.12/site-packages/jupyterlab_server/test_data")
PosixPath('/usr/lib/python3.12/site-packages/jupyterlab_server/test_data')
>>> Path("/usr/lib/python3.12/site-packages/jupyterlab_server/test_data").joinpath("")
PosixPath('/usr/lib/python3.12/site-packages/jupyterlab_server/test_data')

And the code I'm referring to uses importlib.resources to locate some packages, get their full paths and copy some files from there.

Looking at your example, I'm guessing you're using joinpath() as a means to extract a pathlib.Path from a Traversable. That definitely is not supported and is an anti-pattern, because it violates the assumption that a Traversable is only a subset of a pathlib.Path. If you want a materialized path, you should use as_file, e.g.:

>>> from importlib.resources import files, as_file
>>> as_file(files("jupyterlab_server.test_data"))

Of course, if what you need is a tree of resources, you'll need importlib.resources from Python 3.12 or importlib_resources>=5.9.

That sounds reasonable but as_file returns a context manager:

>>> as_file(files("jupyterlab_server.test_data"))
<contextlib._GeneratorContextManager object at 0x7f22b9b6a480>

Is there any nicer and more readable solution than files("jupyterlab_server.test_data")._paths[0]?

@frenzymadness
Copy link
Contributor Author

FTR: It seems that the same problem was reported in python/importlib_resources#263

From the issue, it seems that another possible workaround without accessing the internal attribute of MultiplexedPath is to use str((files("jupyterlab_server.test_data") / "dummy").parent). But both workarounds don't work well with typing. I'm gonna open a new issue about it in the project.

@jaraco
Copy link
Member

jaraco commented Aug 7, 2023

Thanks for the ping - I'd lost track of the issue.

That sounds reasonable but as_file returns a context manager:

>>> as_file(files("jupyterlab_server.test_data"))
<contextlib._GeneratorContextManager object at 0x7f22b9b6a480>

Aah. yes. My mistake. 🤦

I'd forgotten that as_file returns a context manager. The reason it returns a context manager is because Traversable objects may not exist on the file system, and making them exist on the file system (its purpose) implicitly creates state that needs to be cleaned up.

If you're willing to assume that your resource providers are always providing resources that are present in one package on the file system and that cleanup will never be necessary, you could enter the context and never exit:

 draft @ touch foo/__init__.py
 draft @ pip-run importlib_resources -- -q
>>> import importlib_resources as res
>>> dir = res.as_file(res.files('foo')).__enter__()
>>> dir
PosixPath('/Users/jaraco/draft/foo')

Here's a real-world example (not recommended) of a project abusing the context manager to ensure that the data is present on the file system for the duration of the process.

However, in python/importlib_resources#286, I learned that you have a namespace package and thus you get a MultiplexedPath object from files():

 draft @ mkdir -p ns1/foo
 draft @ mkdir -p ns2/foo
 draft @ env PYTHONPATH=ns1:ns2 pip-run importlib_resources -q
Python 3.11.4 (main, Jun 20 2023, 17:23:00) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ^D
 draft @ env PYTHONPATH=ns1:ns2 pip-run importlib_resources -- -q
>>> import importlib_resources as res
>>> mp = res.files('foo')
>>> mp
MultiplexedPath('/Users/jaraco/draft/ns1/foo', '/Users/jaraco/draft/ns2/foo')

As you can see illustrated above, the fact that foo is a namespace package means its contents can exist in multiple locations, so traversing with .joinpath("dummy") or explicitly accessing ._paths[0] is simply collapsing the multiplexed path into the first possible path.

This technique wouldn't work if the resource you desire is present in a later package:

 draft @ touch ns2/foo/data.txt
 draft @ env PYTHONPATH=ns1:ns2 pip-run importlib_resources -- -q
>>> import importlib_resources as res
>>> mp = res.files('foo')
>>> mp.joinpath('data.txt').exists()
True
>>> mp.joinpath('dummy').parent.joinpath('data.txt').exists()
False
>>> with res.as_file(mp) as foo_res:
...   print(foo_res)
...   print(list(foo_res.iterdir()))
... 
/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/tmp6hd8ding/foo
[PosixPath('/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/tmp6hd8ding/foo/data.txt')]
>>> foo_res.exists()
False

That's why MultiplexedPath exists - to honor resources that may exist across one or more paths.

But I suspect what's happening is you're not intending or expecting to get a namespace package or multiplexed path, but instead, you're happening to get one because you're accessing test_data not as a directory but as a package.

When you specify files('jupyterlab_server.test_data'), you're saying, give me the resources for the Python package jupyterlab_server.test_data, same as if you'd said import jupyterlab_server.test_data as td; files(td).

And because Python will magically import any directory as a namespace package, that's why you're getting the namespace package and the MultiplexedPath.

 draft @ mkdir -p pkg/data
 draft @ touch pkg/__init__.py
 draft @ pip-run importlib_resources -- -q
>>> res.files('pkg.data')
MultiplexedPath('/Users/jaraco/draft/pkg/data')

My guess is if you instead were to use files('jupyterlab_server').joinpath('test_data'), you would get a non-multiplexed traversable that you could assume (at your risk) is a pathlib.Path:

 draft @ mkdir -p pkg/data
 draft @ touch pkg/__init__.py
 draft @ pip-run importlib_resources -- -q
>>> import importlib_resources as res
>>> res.files('pkg').joinpath('data')
PosixPath('/Users/jaraco/draft/pkg/data')

(and you could cast(Traversable, data) to satisfy mypy).

I think the real question here is - what are you trying to accomplish with jupyterlab_server's test data? Are you wanting to do anything with it other than list the contents and read files? If so, perhaps your best bet would be to just use the Traversable interface.

The only safe way to access a Traversable is to use its limited interface or rely on as_file to ensure it's materialized in the file system (for the duration of a context manager).

@frenzymadness
Copy link
Contributor Author

Thank you very much for the detailed explanation. I think this issue can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes topic-importlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants