Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change UPath.__new__ behavior #125

Merged
merged 37 commits into from
Aug 2, 2023
Merged

Change UPath.__new__ behavior #125

merged 37 commits into from
Aug 2, 2023

Conversation

ap--
Copy link
Collaborator

@ap-- ap-- commented Aug 2, 2023

This PR changes the way UPath.__new__ works for local paths.

Closes #90.

Currently we return pathlib.PosixPath or pathlib.WindowsPath instances dependent on the OS. This caused issues for users, because the UPath class is not guaranteed to return an instance or subclass instance of itself:

>>> obj = UPath("/local/path/file.txt")
>>> isinstance(obj, UPath)
False

For more details, see: #90

To ensure universal_pathlib's UPath can be used as a drop in for pathlib.Path instances, this PR now implements the following:

Three new UPath subclasses are introduced:

  • upath.implementations.local.PosixUPath (subclass of pathlib.PosixPath)
  • upath.implementations.local.WindowsUPath (subclass of pathlib.WindowsPath)
  • upath.implementations.local.LocalPath

The first two are returned by UPath.__new__ whenever a pathlib compatible local path is provided to UPath(). They are extensively tested against the official CPython test suite for pathlib, to ensure compatibility with their respective python version pathlib equivalent.

The third class is returned whenever a file URI path is provided to UPath(). Local file access is then handled through fsspec.

This PR should also resolve pydantic typing issues #112.

Cheers,
Andreas 😃

Copy link
Collaborator

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Thanks for putting in the time to make all the tests work!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code that you wrote or is this coming from the original pathlib implementation?

Copy link
Collaborator Author

@ap-- ap-- Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all code from CPython's test.support and test.support.os_helper module that is required to run the pathlib tests. There are some small changes in there to handle minor refactoring that happened in those modules between 3.8 and 3.12, so that we only need to vendor one version.

@normanrz
Copy link
Collaborator

normanrz commented Aug 2, 2023

We should think about how we want to roll this out since this is a major breaking change.

@ap--
Copy link
Collaborator Author

ap-- commented Aug 2, 2023

I agree. The actual impact in this case is very minor though, since code that expects old behavior (i.e. UPath("/local/file.txt") returns a pathlib.PosixPath instance) will now receive a fully tested and 100% compatible PosixUPath instance.

But I think it would be nice to move universal_pathlib now to semantic versioning.
We could start slowly and release 0.1.0 and make a decision what feature set we want to be completed for 1.0.0 or we could just say, let's go for 1.0.0 right away. Given the thorough test-suite of upath I think it might be okay to do so.

In any case: poetry projects with "^0.0.24" caret version restrictions are prevented from upgrading.

@normanrz
Copy link
Collaborator

normanrz commented Aug 2, 2023

I agree. The actual impact in this case is very minor though, since code that expects old behavior (i.e. UPath("/local/file.txt") returns a pathlib.PosixPath instance) will now receive a fully tested and 100% compatible PosixUPath instance.

This is true. However, for example, we have code that distiguishes between local paths and remote paths by checking isinstance(p, UPath) after passing the path through the UPath constructor. We would need to change that and make sure that our users get the correct upath version.

I agree that this should get a bump in the minor version.

But I think it would be nice to move universal_pathlib now to semantic versioning.
We could start slowly and release 0.1.0 and make a decision what feature set we want to be completed for 1.0.0 or we could just say, let's go for 1.0.0 right away. Given the thorough test-suite of upath I think it might be okay to do so.

I'm leaning towards a minor bump. Do you have other features or refactorings lined up?

@ap--
Copy link
Collaborator Author

ap-- commented Aug 2, 2023

we have code that distiguishes between local paths and remote paths by checking isinstance(p, UPath) after passing the path through the UPath constructor

I see, that makes sense.

Having read more and more of the discussions on the cpython issue tracker, discuss.python.org and the recent PRs regarding pathlib I now believe that the correct way to determine local paths would be to check if .__fspath__() is implemented and returns a path. We are currently doing this wrong in UPath. But it is because loads of other libraries out there including fsspec, pandas, dask etc. request a string representation of the urlpath via calling os.fspath(...) or .__fspath__() where they should instead use __str__ or as_uri(). But that's something we can try to fix in the future (probably when pathlib.PathBase gets introduced in python3.13)

I'm leaning towards a minor bump. Do you have other features or refactorings lined up?

Minor version increase sounds good to me 😃
Regarding upcoming refactorings, (nothing specific that would be a blocker for the release) I want to get 3 things fixed before 3.12 is released:

And ideally we would add support for 3.12 before the actual release. But it dropped _PosixFlavour (and the other flavour classes), so this might need a bit more thought. Hopefully we'll have a backported pathlib.PathBase implementation soon. That would resolve most of my concerns regarding 3.12. python/cpython#106337 (comment)

Anyways, I'll prepare the 0.1.0 release either later tonight, or tomorrow.

@ap-- ap-- merged commit 662c294 into fsspec:main Aug 2, 2023
@danielgafni
Copy link

Wow great stuff! Thank you guys for your hard work!

]


class LocalPath(UPath):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is LocalPath for? Should WindowsUPath and PosixUPath subclass LocalPath?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalPath handles the "file://" uri style LocalPaths.

The minor difference between the two is:

  • LocalPath provides local file access through fsspec.
  • WindowsUPath and PosixUPath via pathlib.

For your usecase do you just need to determine if a path is local, or if the user provided a non-uri local path?

And yes, WindowsUPath and PosixUPath could derive from LocalPath.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your usecase do you just need to determine if a path is local, or if the user provided a non-uri local path?

For local paths, I'd rather not use fsspec. So, I want a way of determining whether I have local paths (essentially pathlib Paths). It doesn't matter to me whether the user omitted the protocol or put in file://.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't cover that use case right now. And I think it didn't work this way previously either.

in <=0.0.24:

  • with UPath("file:///local/file.txt") you received a UPath instance that managed file access through fsspec
  • with UPath("/local/file.txt") you received a PosixPath instance.

in >=0.1.0:

  • with UPath("file:///local/file.txt") you receive a LocalPath instance that manages file access through fsspec
  • with UPath("/local/file.txt") you received a PosixUPath instance.

workaround

A current workaround to force pathlib handling might be:

pth = UPath(...)

if isinstance(obj, LocalPath):
    obj = UPath(obj.path)  # workaround to force pathlib handling for "file://" uris
    assert isinstance(obj, (PosixUPath, WindowsUPath)) 
if isinstance(obj, (PosixUPath, WindowsUPath)):
    ...  # local.
else:
    ...  # cloud, etc.

but this isn't elegant at all.


options

we'd need a way to determine if file uris should be handled via fsspec or not. Maybe this should actually be an option that should go both ways?

class LocalFileBackend(str, Enum):
    PATHLIB = "pathlib"
    FSSPEC = "fsspec"
    AUTO = "auto"

# --- via a new constructor kwarg ---

obj_a = UPath("/local/file.txt")
obj_b = UPath("file:///local/file.txt")
assert type(obj_a) is PosixUPath and type(obj_b) is LocalPath

obj_a = UPath("/local/file.txt", localfile_backend="pathlib")
obj_b = UPath("file:///local/file.txt", localfile_backend="pathlib")
assert type(obj_a) is PosixUPath and type(obj_b) is PosixUPath

obj_a = UPath("/local/file.txt", localfile_backend="fsspec")
obj_b = UPath("file:///local/file.txt", localfile_backend="fsspec")
assert type(obj_a) is LocalPath and type(obj_b) is LocalPath

# --- via a utility function ---

def ensure_local(pth: UPath, backend: LocalFileBackend) -> PosixUPath | WindowsUPath | LocalPath:
    """raise ValueError if not local, ensure backend if local"""
    ...

After writing, I would almost prefer an ensure_local helper function. But we can go either route.

What do you think?

Copy link
Collaborator

@normanrz normanrz Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Probably, I didn't run into this issue because I stripped the file:// prefix.
Tbh for my purposes, isinstance(obj, (PosixUPath, WindowsUPath)) is fine. If it were isinstance(obj, LocalUPath) it would be easier. But no big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should UPath.__new__ return pathlib.Path instances for local paths
3 participants