Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why was Traversable implemented? #96870

Closed
jaraco opened this issue Sep 16, 2022 · 4 comments
Closed

Why was Traversable implemented? #96870

jaraco opened this issue Sep 16, 2022 · 4 comments

Comments

@jaraco
Copy link
Member

jaraco commented Sep 16, 2022

I'm a bit confused as to why Traversable was ever implemented in the first place.
Seems like Traversable is supposed to represent file system paths, however I was sure that's why we have Paths.

Whatever Traversable's added value is, I still think the class should be coupled to Path type-wise. Maybe by marking Traversable as PathLike.

At the moment, functions like importlib.resources.files specify a return type of Iterator[Traversable] but in reality return Iterator[Path]. This results in confusing type hint warnings since IDEs have no way to know that Traversable and Path are related.

I'm not very experienced with the newer versions of importlib so my assumptions might be a bit off, hoping one of could shed some light.

Thanks

Originally posted by @moomoohk in #88366 (comment)

@jaraco
Copy link
Member Author

jaraco commented Sep 16, 2022

Traversable was modeled after pathlib.Path objects, implementing a minimal subset of the features needed to support traversal of directories and files and accessing their contents. It was implemented because there are package providers that supply packages that are not on the file system. The second most common TraversableResources implementation is the ZipReader, which illustrates at least one other popular implementation a Traversable that's not a pathlib.Path (it's a zipfile.Path).

Moreover, importlib resources aims to be extensible, so a custom loader that loads its modules (and resources) from some other source (database, brokerage service, space imagery) is able to supply Traversable resources.

Traversable declares/implements the protocol that a provider must supply and that a consumer can rely upon. Although the user might get a pathlib.Path, they shouldn't rely upon it because next time they could get a zipfile.Path or some other object that only implements Traversable.

At the moment, functions like importlib.resources.files specify a return type of Iterator[Traversable] but in reality return Iterator[Path]. This results in confusing type hint warnings since IDEs have no way to know that Traversable and Path are related.

Maybe Traversable should register known implementations of its protocol or known implementations should register as Traversable:

>>> import importlib.resources.abc as abc
>>> import pathlib
>>> abc.Traversable.register(pathlib.Path)
<class 'pathlib.Path'>
>>> isinstance(pathlib.Path(), abc.Traversable)
True

That may address the typing issue. Can you possibly put together an example that illustrates the typing issue (using mypy or similar)?

Whatever Traversable's added value is, I still think the class should be coupled to Path type-wise. Maybe by marking Traversable as PathLike.

Declaring Traversable as PathLike would violate a number of assumptions when the Traversable object is not present on the filesystem. PathLike is for representing objects that have a string representation pointing to a path on the local file system. For example, what would you expect os.fspath of a zipfile.Path to return?

@moomoohk
Copy link

moomoohk commented Sep 17, 2022

Thank you for the detailed answer. I understand now.

Declaring Traversable as PathLike would violate a number of assumptions when the Traversable object is not present on the filesystem.

Would it be more correct for Path to be marked as Traversable then? This seems to me more straightforward than any registration mechanism.

I use PyCharm's builtin type analyzer (might be mypy behind the scenes but I have no idea...) which is static. I think dynamic registration would still leave the issue unsolved.

Can you possibly put together an example that illustrates the typing issue?

Yes indeed. As I wrote, I'm not too familiar with mypy so I can only attest to strange behavior in PyCharm.

Apologies for the specific use-case but it's a simple one. I'm collecting paths to resource files and casting them to AsyncPath objects from the aiopath library:

from importlib import resources
from aiopath.path import AsyncPath

for resource in resources.files("resources_dir").iterdir():
    AsyncPath(resource)  # Expected type 'str | PathLike[str]', got 'Traversable' instead

Of course since resource in a Path object it should be recognized as PathLike however that's only true during runtime. PyCharm's static analyzer sees resource as Traversable and has no way to know that Path implements Traversable...

My temporary solution was to explicitly mark resource as Path so PyCharm ignores the "irrelevant" return type of iterdir():

from pathlib import Path
from importlib import resources
from aiopath.path import AsyncPath

resource: Path
for resource in resources.files("resources_dir").iterdir():
    AsyncPath(resource)  # No type warning

Initially I suspected a bug in the aiopath library. However, upon further inspection and thought, I arrived at the conclusion that this is a deeper issue. It doesn't make sense to me to expect all Path-compatible functions to explicitly accept Traversable as well.

@nitzmahone
Copy link

nitzmahone commented Oct 10, 2022

Would it be more correct for Path to be marked as Traversable then?

It wouldn't - Traversable is a PEP544 Protocol, which pathlib.Path happens to conform to (as could infinitely many other types). The only thing you're guaranteed to get back from resources.files() is something conforming to Traversable that lets you navigate a hierarchy and/or fetch the associated data for leaf nodes. The type checker is doing its job properly in telling you that AsyncPath could receive a value that's not going to be convertible to a simple filesystem path (eg if you were to dig around in a package hosted in a zipfile, you'd get a value that conforms to Traversable but that's not directly convertible to a navigable path).

There are lots of ways to just shut the type checker up, but none of them solve the underlying (valid) issue that it's reporting. If you're quite sure that the package you're enumerating will never be hosted in a zip or be used with any other loader mechanism that might not just give you native filesystem paths (or if you just want to fail in a case where it does), I'd personally just guard the AsyncPath construction with something like isinstance(resource, os.PathLike). Negating that and raising a ValueError would also work, but dunno Pycharm's type checker is smart enough to narrow on the negative like that.

@jaraco
Copy link
Member Author

jaraco commented Nov 26, 2022

In some sense, Path is already marked as Traversable:

$ python3.12 -q
>>> import importlib.resources.abc, pathlib
>>> isinstance(pathlib.Path(), importlib.resources.abc.Traversable)
True

I hope it's clear now that the issue isn't that pathlib.Path doesn't implement Traversable but that AsyncPath doesn't accept arbitrary Traversables.

If what your application needs is the guarantee of a file on the file system, importlib resources does provide the as_file context to ensure that a Traversable is available as a pathlib.Path on the file system. You could use that to construct an AsyncPath, except you probably don't want to because:

  • The context is meant to be temporary, or at least define the lifetime of the potentially temporary file, which may not match with your intended usage.
  • as_file is itself synchronous, so it may not have the async behaviors you need from aiopath.

Perhaps more importantly, the call to files() isn't asynchronous either, so it will also block any async loop.

I hope that adds some clarity. I'm closing this issue as answered, but feel free to continue the conversation and we can re-open if there's more to be discussed or investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants