-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: VersionedS3Path
#139
Conversation
* update existing code to prepare for VersionedS3Path * add tests for VersionedS3Path * refactor imports * add tests * refactor tests * add documentation * add exists() logic for versioned s3 objects * update tests for versioned s3 buckets * remove unused import * versioneds3path P1 * from_uri from_bucket_key * versioneds3path II * versioneds3path III * formatting * debug * debug * debug * doc-string * uncomment tests --------- Co-authored-by: Nick Langellier <nick@videa.ai> Co-authored-by: Nick Langellier <78754494+nlangellier@users.noreply.github.com>
@liormizr this PR is ready. Can we get a review please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chnpenny
Sorry for the delay with this review
First thing I'll like to thank you for your contribution here
I think that this is a great start for a versioned s3path object.
New about the review
I think that we need to start something better decoupled from s3path
For now I don't want to touch S3Path and all his objects
I suggest to create a new:
- _VersionedS3Accessor (that is based on _S3Accessor)
- PureVersionedS3Path (that is based on PureS3Path)
- VersionedS3Path (that is based on PureVersionedS3Path & S3Path)
Maybe in the future we will join them to one with S3Path or maybe not but for the first version
I think that this is the right direction.
I added more comments in the code so you will understand the direction that I want us to go on
Thank you again for the work
s3path.py
Outdated
class StatResult(namedtuple('BaseStatResult', 'size, last_modified')): | ||
class VersionedS3Path(S3Path): | ||
|
||
def __new__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I wouldn't recommend adding the logic if to return a S3Path or an VersionedS3Path object
I want first to see how heavily there will be the need for this from the community.
Maybe in the future we will do it from S3Path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point is type hints
Type hints are very nice but we need to be careful not to abuse them too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if you want me to remove some type hints
s3path.py
Outdated
else: | ||
return super().__new__(cls, *args) | ||
|
||
def __init__(self, *args: Union[str, Path], version_id: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will recommend for compatibility with pathlib to not implement the init method
You can do this instead
class VersionedS3Path(S3Path):
"""
S3Path subclass for AWS S3 service Keys with Versions.
>> from s3path import VersionedS3Path
>> VersionedS3Path('/<bucket>/<key>', version_id='<version_id>')
<< VersionedS3Path('/<bucket>/<key>', version_id='<version_id>')
"""
def __new__(cls, *args, version_id):
self = cls._from_parts(args, version_id=version_id)
if not self._flavour.is_supported:
raise NotImplementedError("cannot instantiate %r on your system"
% (cls.__name__,))
return self
@classmethod
def _from_parts(cls, args, *, version_id):
self = super()._from_parts(cls, args)
self._version_id = version_id
return self
def __rtruediv__(self, key):
try:
return self._from_parts([key] + self._parts, version_id=self._version_id)
except TypeError:
return NotImplemented
This way we are steeking with the pathlib methodologies and will stay compatible better for future versions.
The _from_parts class method is like the "init" method here.
Not about the VersionedS3Path version_id attribute, it's your choice if do you want it privet/public attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion. I used most of you suggestion but I did not include the __rtruediv__
because I think it makes more sense that relative paths are not allowed for versioned paths. let me know if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nlangellier
Why do you don't want to support relative paths?
what do you think about this use case for example:
bucket = VersionedS3Path('/bucket/')
for file_name in ['f0', 'f1', ..., 'fn']:
path = bucket / VersionedS3Path(file_name, version_id=...)
# OR
path = bucket.joinpath(VersionedS3Path(file_name, version_id=...))
...
according to pathlib this is a valid api
In S3Path we check if the path is absolute only in the level of methods that do api calls for AWS
So all methods in Pure classes doesn't really need this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right. that should be allowed. i added both __truediv__
and __rtruediv__
methods to properly handle all possible combinations:
VersionedS3Path
/VersionedS3Path
VersionedS3Path
/S3Path
VersionedS3Path
/str
S3Path
/VersionedS3Path
str
/VersionedS3Path
s3path.py
Outdated
""" | ||
|
||
self = S3Path.from_uri(uri) | ||
return cls._from_s3_path(s3_path=self, version_id=version_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_from_s3_path is a little to complicated for this situation.
I think we need to think on a simpler solution
First think this is a "Pure" path object class method.
Why not add a PureVersionedS3Path as well?
I recommend something like this:
class PureVersionedS3Path(PureS3Path):
def __new__(...):
...
@classmethod
def from_uri(cls, uri: str, *, version_id: str) -> PureVersionedS3Path:
"""
...
"""
if not uri.startswith('s3://'):
raise ValueError('Provided uri seems to be no S3 URI!')
return cls(uri[4:], version_id=version_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. good idea 👍
@chnpenny how do you want to continue? |
* address PR comments * update versioned s3 bucket test * remove unused type hint * remove unused line * add VersionedS3Path._init method
@liormizr thanks for the review. i implemented your suggestions and this is ready for another round of review. |
Thanks for the comment about relative paths @liormizr. This is again ready for review. |
thanks @chnpenny & @nlangellier
|
Thank you @liormizr Regarding the first question: Yes forgot to update that. Will do that today. Regarding the second point: We felt it was unnecessary to do so, but perhaps we missed an important detail as to why you suggested to override the def from_uri(cls, uri: str, *, version_id: str) -> PureVersionedS3Path:
if not uri.startswith('s3://'):
raise ValueError('Provided uri seems to be no S3 URI!')
return cls(uri[4:], version_id=version_id) but we made the decision to recycle the code from def from_uri(cls, uri: str, *, version_id: str) -> PureVersionedS3Path:
self = PureS3Path.from_uri(uri)
return cls(self, version_id=version_id) The idea there was that if a future change is made to def __rtruediv__(self, key):
if not isinstance(key, (PureS3Path, str)):
return NotImplemented
new_path = super().__rtruediv__(key)
new_path.version_id = self.version_id
return new_path then we let |
For the first version we can go with your approach Well done and thank you for the hard work |
@liormizr we hadn't finished updating the documentation yet. We'll open a separate PR for that later today |
This PR addresses issue #126 by
VersionedS3Path
class to the public APIst_version_id
property toStatResult
tests/test_path_operations.py
README.rst
anddocs/interface.rst
Additional notes
VersionedS3Path
inherits fromS3Path
but adds aversion_id
parameter to the various constructor methods. In order to follow the Liskov substitution principle, this parameter is made optional with a default value ofNone
. We thought that if a user does not supply a value for this parameter, then the returned instance should be of typeS3Path
instead ofVersionedS3Path
. Hence the necessity for overriding the__new__
constructor.S3Path
objects, they do not make sense forVersionedS3Path
objects as relative and bucket-only S3 paths have no concept of a version ID. Thus aValueError
is raised if a used attempts to instantiate such an object.__repr__
method as the default__repr__
provided bypathlib
would omit the value of theversion_id
argument, leading to a misrepresentation of the construction of the object.st_version_id
parameter ofStatResult
has a default value ofNone
and thus is fully backwards compatible with previous versions ofs3path
. In fact, we were careful to make all changes in this PR backwards compatible.README.rst
, we added the line\*New in S3Path version 0.5.0
. Please feel free to ask us to remove this line or to change it to whatever version this will be added to.