Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-24132: Add pathlib._AbstractPath #31085

Closed

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Feb 2, 2022

After a couple of years of mostly deleting things in pathlib, I'm glad to finally offer an addition!

This PR adds a new _AbstractPath class that sits between PurePath and Path in the class hierarchy. _AbstractPath objects have all the same methods as Path objects, but their filesystem-accessing methods consistently raise NotImplementedError.

(updated Feb 15th: now, only stat(), open() and iterdir() are abstract; other methods that directly access the filesystem are not part of the _AbstractPath interface).

This class will form the basis of a future public-facing abstract class that can be used by the likes of s3path and cloudpathlib. It could also be used to make zipfile.Path objects fully pathlib-compatible (no missing methods!)

Why is this underscore-prefixed? I think this needs some time to gestate in CPython before we write full docs/tests and remove the prefix. I'd make an appeal to the authors of pathlib-y packages on PyPI to try it in an experimental branch and let us know where the pain points are. Three or so roadblocks remain before we can document and recommend it. See this comment

https://bugs.python.org/issue24132

The `Path` class is now an *implementation* of `_AbstractPath`; its methods
call functions in `os.path`, `io`, `pwd`, etc, whereas the corresponding
methods in `_AbstractPath` instead raise `NotImplementedError`.
Check the presence of the necessary `os`, `pwd` and `grp` functions at
import time.
Lib/pathlib.py Outdated Show resolved Hide resolved
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the overall idea of this PR, but I'm afraid I think trying to cram everything into one ABC is the wrong approach -- the interface still feels a little confused to me.

For example -- I'm a static-typing guy, and I have no idea how we'd type AbstractPath.absolute() in typeshed. For a class that override AbstractPath.cwd(), AbstractPath.absolute() returns an instance of that class. But subclasses of AbstractPath can't be guaranteed to override AbstractPath.cwd(), because it's not an abstractmethod, because it's not part of the core interface. Which means that we couldn't include this method in the typeshed stub for AbstractPath at all, because it would be unsafe for an instance of an AbstractPath subclass to ever call absolute(). So, why is absolute() in AbstractPath at all?

I'm using cwd() and absolute() as an example, but I think this is a broader problem for all of the methods in AbstractPath that raise NotImplementedError but are not abstractmethods (and, by extension, all of the mixin methods that call methods that might-or-might-not be implemented).

I would counsel splitting AbstractPath up into several smaller ABCs. In typeshed we could pretend that these are PEP 544 protocols, in much the same way we do with os.PathLike, which is an ABC at runtime but is considered a Protocol by static type checkers.

Instead of having a structure like this:

class AbstractPath(PurePath, ABC):
    # core interface
    @abstractmethod
    def iterdir(self): ...
    @abstractmethod
    def stat(self, *, follow_symlinks=True): ...
    @abstractmethod
    def open(self, mode='r', buffering=-1, encoding=None,
             errors=None, newline=None): ...
    @classmethod
    def cwd(cls): ...  # Not abstract, but raises NotImplementedError
    def absolute(self): ...  # Depends on cwd() being implemented
    def resolve(self): ...  # Not abstract, but raises NotImplementedError

class Path(AbstractPath): ...

You could instead have something like this:

class AbstractBasePath(PurePath, metaclass=ABCMeta):
    """I represent the minimum requirements for a class to implement a pathlib-esque interface"""
    @abstractmethod
    def iterdir(self): ...
    @abstractmethod
    def stat(self, *, follow_symlinks=True): ...
    @abstractmethod
    def open(self, mode='r', buffering=-1, encoding=None,
             errors=None, newline=None): ...

class ResolveablePathMixin(metaclass=ABCMeta):
    """Mixin ABC for paths that can be resolved in some sense"""
    @classmethod
    @abstractmethod
    def cwd(cls): ...
    def absolute(self): ...    # Concrete method that calls cwd()
    @abstractmethod
    def resolve(self): ...

class AbstractPath(AbstractBasePath, ResolveablePathMixin):
    """I represent the full interface for a pathlib Path"""
    # This class body would be completely empty
    # The only purpose of this class would be to accumulate
    # all of the abstractmethods from AbstractBasePath
    # and all of the mixin classes

class Path(AbstractPath):
    # This class body would be filled with concrete implementations
    # of all the abstract methods accumulated in `AbstractPath`

In this way, users could decide whether they want to implement just the core interface, by subclassing AbstractBaseClass; the full pathlib interface, by subclass AbstractPath; or something custom, by using multiple inheritance to subclass AbstractBaseClass and one of the smaller mixin classes at the same time.

(You're far more knowledgeable about pathlib internals than I am, so: apologies if I've got any of the pathlib-specific details wrong here!)

@AlexWaygood
Copy link
Member

I would look to the collections.abc module for inspiration -- observe how the MutableSequence interface is built out of an accumulation of smaller interfaces:

@barneygale
Copy link
Contributor Author

Thank you so much Alex, that's great advice.

I need to give this idea some serious consideration and work out exactly which ABCs would be involved.

My kneejerk concerns about this are as follows:

  1. That there may be too many ABCs and combinations thereof. For example, tar archives support symlinks (read+write) but not hardlinks; other APIs may support reading symlink targets but not creating symlinks. I worry there's a difficult multiple taxonomies problem here, but I'm speculating. I need to actually try it.
  2. That pathlib can already raise NotImplementedError for readlink(), symlink_to(), hardlink_to(), owner() and group(), and perhaps we need to do something about that first, e.g. make them raise OSError with EPERM or ENOSYS?

@barneygale
Copy link
Contributor Author

I've logged bpo-46733 to address the existing misuse of NotImplementedError. I suspect the outcome of that ticket will narrow our options here :-)

Lib/pathlib.py Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

Update on this PR: I've reduced the number of abstract methods substantially! Now only stat(), open() and iterdir() are abstract. I hope that addresses some concerns. Any more feedback? :)

On the python-dev mailing list, Brett Cannon suggested this could be a protocol rather than an ABC. My only concern there is that the protocol would be pretty extensive, because the PurePath + _AbstractPath APIs are pretty extensive, even with methods like cwd() and home() culled. I'd appreciate any guidance anyone might have!

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 21, 2022

Update on this PR: I've reduced the number of abstract methods substantially! Now only stat(), open() and iterdir() are abstract. I hope that addresses some concerns. Any more feedback? :)

I think the interface is much clearer now — thank you! I won't formally "approve" the PR, because I just don't know enough about pathlib, but the overall design of the class looks really strong to me now.

W.r.t. ABCs versus protocols — I agree that it seems too big to be a protocol. Additionally, a protocol cannot inherit from a concrete class, so you'd have to change the inheritance structure if you wanted to make it a protocol (PurePath is concrete).

The only other piece of feedback I have is that I agree with Ethan Furman's comments on python-dev — I'd make the class public (name it pathlib.AbstractPath rather than pathlib._AbstractPath). I think either we "go for it", or it's not really worth doing — it doesn't make sense to simultaneously say "We'd like users to experiment with this, please send us your feedback", and "Watch out, this is a private-to-this-module implementation detail, users beware" 🙂

@barneygale
Copy link
Contributor Author

Thanks very much Alex.

On making this AbstractPath rather than _AbstractPath: I think there are still too many rough edges for me to write docs for this without wincing throughout. Particularly:

  1. Subclassing only from AbstractPath doesn't work, because _flavour is unset. You currently need to additionally subclass PurePosixPath or PureWindowsPath, or set _flavour yourself. I'd like to rectify this by simultaneously removing flavours and giving PurePath posix path behaviour by default. (If I can be cheeky and appeal for reviewers, I have two simple PRs available before the main event: bpo-44136: pathlib: merge _Flavour.make_uri() into PurePath.as_uri() #30320 and bpo-44136: pathlib: merge _Flavour.is_reserved() into PurePath.is_reserved() #30321)
  2. How do we persist state (sockets, file objects, etc) to child objects, e.g. those generated by path / 'foo'? It can be done already with a further subclass, but it feels odd.
  3. Should we recommend that users instantiate os.stat_result objects themselves when implementing stat()? If so think we need to add a nicer constructor. Or perhaps we add a new stat.Status class (protocol?) with a little more object-orientation?

What I'll do instead is withdraw my recommendation for limited experimentation by third parties - I'll adjust the PR description and the _AbstractPath docstring. Hopefully that clears things up!

@brettcannon
Copy link
Member

I think either we "go for it", or it's not really worth doing

The real question is whether the API will be changing at all, or does @barneygale have some other changes planned which would influence this? Once this is made public there's no going back without a lot of pain, so we should be thoughtful about how the API will look and what is going to be asked of users.

@brettcannon brettcannon self-requested a review February 22, 2022 23:35
@barneygale
Copy link
Contributor Author

I don't want to commit to freezing the interface just yet. But it's not too far away - I think we'll be able to do it in 3.12 or 3.13.

Even without making AbstractPath public (yet!), this PR is still worthwhile IMHO. The implementation of pathlib is simpler to understand and work. It allows us to incrementally add tests as we freeze parts of the interface. The outstanding issues I mentioned in my previous comment are unlikely to affect the interface much, if at all.

@AlexWaygood
Copy link
Member

Even without making AbstractPath public (yet!), this PR is still worthwhile IMHO. The implementation of pathlib is simpler to understand and work. It allows us to incrementally add tests as we freeze parts of the interface. The outstanding issues I mentioned in my previous comment are unlikely to affect the interface much, if at all.

That makes sense to me -- and it's good to have that clarified! :)

@brettcannon
Copy link
Member

My current thinking (while I work through my PR review backlog to reach this PR), is this should be used for zipfile.Path. I also wouldn't be apposed to that happening this PR if @barneygale wanted to give that a shot while he waits for a review from me.

@barneygale
Copy link
Contributor Author

Yep, will do! That helps clarify the order of things a lot actually. I think we need to do these things first (in this order):

  1. Make PurePath (and hence Path) directly subclassable, without needing to worry about _flavour. PR here: gh-68320, gh-88302 - Allow for pathlib.Path subclassing #31691
  2. Unify the creation of child paths in a single method, like zipfile.Path._next() does.

I'm going to mark this PR as a draft for now, will circle back when I've solved those. We'll be able to drop the underscore prefix then, too!

@barneygale barneygale marked this pull request as draft March 5, 2022 02:45
@brettcannon
Copy link
Member

@barneygale do you still want to wait to have this PR reviewed until the API can go public? Or can it be reviewed as-is and the API kept private for now and consider this PR as cleaning up code? I'm trying to clean up my PR review queue and so if this is going to stay in draft because you only want this merged if the API can go public I will unassign myself until it's ready.

@barneygale
Copy link
Contributor Author

@barneygale do you still want to wait to have this PR reviewed until the API can go public? Or can it be reviewed as-is and the API kept private for now and consider this PR as cleaning up code? I'm trying to clean up my PR review queue and so if this is going to stay in draft because you only want this merged if the API can go public I will unassign myself until it's ready.

Please go ahead and unassign yourself! I'd like to make zipfile.Path use this new API, but there's some work to do elsewhere first.

@brettcannon brettcannon removed their request for review April 14, 2022 18:45
@zmievsa
Copy link
Contributor

zmievsa commented May 11, 2022

@barneygale Do I understand correctly that the aim of this change is to make something similar to pathlab somewhat easier to do? I.e. Allowing pathlib capabilities to be used outside of regular system paths and extend it to any kinds of paths that share similar semantics. If so, I'd love to help with it, if you need any help.

@barneygale
Copy link
Contributor Author

barneygale commented May 11, 2022

@barneygale Do I understand correctly that the aim of this change is to make something similar to pathlab somewhat easier to do? I.e. Allowing pathlib capabilities to be used outside of regular system paths and extend it to any kinds of paths that share similar semantics. If so, I'd love to help with it, if you need any help.

Yep that's exactly right! Could be archived files (e.g. in .zip and .tar files) or remote files (e.g. on S3, FTP) - if we give them a universal interface, users can learn one API and apply it everywhere. I'll have a think and let you know where I could do with a hand :D thanks for the offer.

@barneygale
Copy link
Contributor Author

Once #100479 is resolved, I'll resume work on this PR.

@barneygale
Copy link
Contributor Author

I'll resume work on this once #100481 lands.

@barneygale
Copy link
Contributor Author

Contrary to my earlier comments, I'm going to abandon this PR and continue the work elsewhere. Specifically, I'm planning to add a pathlib._AbstractPath class in order to implement tarfile.TarPath.

I'm going to leave zipfile.Path alone for now. It's somewhat incompatible with pathlib at the moment, and all the solutions I've considered involve a deprecation cycle. That will be better motivated once we have TarPath in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants