Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importlib.resources regression for custom ResourceReader #127337

Open
zooba opened this issue Nov 27, 2024 · 3 comments
Open

importlib.resources regression for custom ResourceReader #127337

zooba opened this issue Nov 27, 2024 · 3 comments
Assignees
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir topic-importlib

Comments

@zooba
Copy link
Member

zooba commented Nov 27, 2024

Since Python 3.13 (including 3.14), my custom ResourceReader is now raising errors when used with importlib.resources.contents. This code is otherwise functioning fine from 3.8 onwards.

Traceback (most recent call last):
  File "D:\cpython\t.py", line 41, in <module>
    print(contents(FakeModule))
          ~~~~~~~~^^^^^^^^^^^^
  File "D:\cpython\Lib\importlib\resources\_functional.py", line 60, in contents
    return (resource.name for resource in _get_resource(anchor, path_names).iterdir())
                                          ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "D:\cpython\Lib\importlib\resources\_functional.py", line 81, in _get_resource
    return files(anchor).joinpath(*path_names)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
TypeError: CompatibilityFiles.SpecPath.joinpath() missing 1 required positional argument: 'other'

Self-contained repro below (the real one gets compiled into DLLs, so you don't want to try to debug it ;) ):

from importlib.abc import Loader
from importlib.machinery import ModuleSpec
from importlib.resources.abc import ResourceReader

_DATA = {"FakeModule.NAME": b"hello"}
_DATA_NAMES = set(_DATA)

class MyReader(ResourceReader):
    def __init__(self, prefix):
        self.prefix = prefix

    def open_resource(self, resource):
        import io
        return io.BytesIO(_DATA[self.prefix + resource])

    def resource_path(self, resource):
        raise FileNotFoundError()

    def is_resource(self, resource):
        return self.prefix + resource in _DATA_NAMES

    def contents(self):
        p = self.prefix
        lp = len(p)
        return (n[lp:] for n in _DATA_NAMES if n.startswith(p))

class MyLoader(Loader):
    create_module = ...
    exec_module = ...

    def get_resource_reader(self, fullname):
        return MyReader(fullname + ".")

class FakeModule:
    __loader__ = MyLoader()
    __name__ = "FakeModule"
    __spec__ = ModuleSpec(__name__, __loader__)

from importlib.resources import contents
print(contents(FakeModule))

(ping @jaraco)

@zooba zooba added topic-importlib 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 27, 2024
@FFY00
Copy link
Member

FFY00 commented Dec 1, 2024

Looks like when the legacy API was re-added in python/importlib_resources#303, it added a new path_names (you can see the original code in https://github.com/python/importlib_resources/pull/282/files), and relied on the joinpath implementation of most Traversable objects being able to handle no arguments. When re-adding the new API, the tests for the legacy API that we removed in python/importlib_resources#239 weren't also brought back, otherwise they would have exercised the relevant code and triggered the bug.

But looking at the CompatibilityFiles implementation, it doesn't match the current Traversable definition. I remembered Traversable.joinpath to only take one argument, so I went to see when that changed, and it appears to have been in GH-91623, which I think was actually a breaking change, but it's done now... Anyway, when that was changed the CompatibilityFiles implementation wasn't updated.

Considering the new Traversable.joinpath definition, I think it not receiving any arguments might actually be in-spec. Either way, the CompatibilityFiles implementation is bad, it can only ever take one argument, and now it needs to be able to take any number of arguments.

That said, we need to be more careful when making changes to protocols, as I think this issue was the result of GH-91623 making a breaking change to the Traversable protocol (protocols go both ways, so it's very difficult to make non-breaking changes).

@picnixz picnixz added the stdlib Python modules in the Lib dir label Dec 1, 2024
@jaraco
Copy link
Member

jaraco commented Dec 1, 2024

Thanks Filipe for the analysis. Indeed, by allowing joinpath to accept 0 or more arguments, any caller relying on being able to pass zero arguments will fail, and CompatibilityFiles does just that.

Going one step deeper, I traced the change of the protocol to python/importlib_resources#248, which references GH-91298.

Copying @encukou for visibility.

I think the proper fix at this stage is to expand the CompatibilityFiles interface to match the new protocol, probably something like this:

diff --git a/importlib_resources/_adapters.py b/importlib_resources/_adapters.py
index 50688fbb66..9f78bca0ff 100644
--- a/importlib_resources/_adapters.py
+++ b/importlib_resources/_adapters.py
@@ -66,10 +66,12 @@ class CompatibilityFiles:
 
         is_dir = is_file
 
-        def joinpath(self, other):
+        def joinpath(self, *descendants):
+            if not descendants:
+                return self
             if not self._reader:
-                return CompatibilityFiles.OrphanPath(other)
-            return CompatibilityFiles.ChildPath(self._reader, other)
+                return CompatibilityFiles.OrphanPath(*descendants)
+            return CompatibilityFiles.ChildPath(self._reader, '/'.join(descendants))
 
         @property
         def name(self):

(maybe also ChildPath needs an update)

@FFY00 Are you willing to tackle this one?


That said, we need to be more careful when making changes to protocols, as I think this issue was the result of GH-91623 making a breaking change to the Traversable protocol (protocols go both ways, so it's very difficult to make non-breaking changes).

I've run into this concern a few times, and it's not obvious to me how to navigate these changes. I don't have a mental model for how safely to adapt a protocol (or if it's even possible). I'm aware of the challenges; I just don't have any ready-made solutions.


Looking at the issue as reported, it appears the resource reader is reliant on the CompatibilityFiles, which I'd hope becomes obsolete soon. In fact, MyReader can be adapted to implement TraversableResources (provide a .files() method) to avoid falling back to the compatibility layer, something like:

from importlib.abc import Loader
from importlib.machinery import ModuleSpec
from importlib.resources.abc import ResourceReader, TraversableResources

_DATA = {"FakeModule.NAME": b"hello"}
_DATA_NAMES = set(_DATA)


class TraversableData:
    def __init__(self, prefix):
        self.prefix = prefix

    def is_dir(self):
        return self.prefix.endswith('.')

    def is_file(self):
        return not self.prefix.endswith('.')

    def iterdir(self):
        return (
            TraversableData(name)
            for name in _DATA if name.startswith(self.prefix)
        )

    def open(self):
        import io
        return io.BytesIO(_DATA[self.prefix])

    def joinpath(self, *descendants):
        suffix = '.'.join(descendants)
        return TraversableData(self.prefix + suffix)


class MyReader(TraversableResources):
    def __init__(self, prefix):
        self.prefix = prefix

    def files(self):
        return TraversableData(self.prefix)


class MyLoader(Loader):
    create_module = ...
    exec_module = ...

    def get_resource_reader(self, fullname):
        return MyReader(fullname + ".")

class FakeModule:
    __loader__ = MyLoader()
    __name__ = "FakeModule"
    __spec__ = ModuleSpec(__name__, __loader__)

from importlib.resources import contents
print(contents(FakeModule))

That change runs on Python 3.14.

I tried it on Python 3.9, and even after adapting the imports, it doesn't work as I'd hoped. I'd have to spend some more time to trace the cause, but I do think it would be beneficial to explore getting off of CompatibilityFiles and providing a native TraversableResources interface (introduced in Python 3.9). For this specific issue, what is the minimum version of Python that the resource provider needs to support?

@zooba
Copy link
Member Author

zooba commented Dec 2, 2024

In fact, MyReader can be adapted to implement TraversableResources (provide a .files() method) to avoid falling back to the compatibility layer

Right, but this requires the maintainer to do work. If the automatic adaptation gets it right, then users don't have to wait. We're in the halfway position where maintainers can sometimes not have to do anything, but will only discover it for sure after all their users have updated and broken. A better position is (a) for it all to work ( 😉 ), or (b) for it to be totally deprecated and removed so that maintainers can tell their users to avoid updating Python entirely until they've gotten an update ready. (Obviously (a) is preferred, but if we start down the path of (a) then (b) is totally cut out.)

For this specific issue, what is the minimum version of Python that the resource provider needs to support?

It doesn't matter. 3.13 has to be supported, and so there's now a hack in there to handle as far back as is needed (I believe 3.7 onwards still works). But we shouldn't have let it stop working in 3.13.

FWIW, my new traversable implementation looks pretty much like that, though I keep the old implementation around so that older versions also work. I'd prefer to have duplicated code that works than a single implementation that doesn't, especially when we're talking about multiple single-line functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir topic-importlib
Projects
None yet
Development

No branches or pull requests

4 participants