-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix ObjectStoragePath compatible with universal-pathlib 0.3.0 to avoid maximum recursion depth exceeded #56270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d maximum recursion depth exceeded
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @bolkedebruin
| shutil.copyfileobj(f1, f2, **kwargs) | ||
|
|
||
| def copy(self, dst: str | ObjectStoragePath, recursive: bool = False, **kwargs) -> None: | ||
| def copy(self, dst: str | ObjectStoragePath, recursive: bool = False, **kwargs) -> None: # type: ignore[override] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we normally don't use ignores unless absolutely necessary as ignores don't really fix stuff but just hide the issue.
isn't there a way to fix without ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh i dont have any idea the super class signature is like this:
task-sdk/src/airflow/sdk/io/path.py:297: note: Superclass:
task-sdk/src/airflow/sdk/io/path.py:297: note: def [T: WritablePath] copy(self, target: T, **kwargs: Any) -> T
task-sdk/src/airflow/sdk/io/path.py:297: note: Subclass:
task-sdk/src/airflow/sdk/io/path.py:297: note: def copy(self, dst: str | ObjectStoragePath, recursive: bool = ..., **kwargs: Any) -> None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling for the expert @uranusjr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay i removed that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didnt worked mypy still not happy
8f78f58 to
a0415dc
Compare
|
I think it looks okay. assuming |
yeah it looks like it has updates https://github.com/fsspec/universal_pathlib/pull/405/files |
a0415dc to
d9953f4
Compare
|
will be away for whole day please feel free to merge this, mypy still not happy and have added ignore for now. |
I couldn't find it in that pull request. https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L1074 still has |
bolkedebruin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the unrelated change from the tests or explain why it is required.
sorry what i meant is , the relative_to function implementation changed. in 0.3.0 its returning
in 0.2.6
earlier versions the implementation of relative_to function is in the cloudpath: https://github.com/fsspec/universal_pathlib/pull/405/files#diff-311a53e84d3cff9dd608444fabb9f22eace31a4f35cdd481083d1761296e41cdL63 but its removed from the cloud path instead it is now implemented in core with slight modification that returns So the test failing with the current implementation of
i could change the assertion from |
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the explanation is clear and reasonable.
amoghrajesh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
| conn_id = self.storage_options.get("conn_id") | ||
| if self._protocol and conn_id: | ||
| return f"{self._protocol}://{conn_id}@{self.path}" | ||
| return f"{self._protocol}://{conn_id}@{self.parser.join(*self._raw_urlpaths)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the only options? Relying on a private method doesn't feel great.
Could we do this instead:
super(type(self), self).__str__(self.path)
It's much more verbose, and opaque, yes, but it doesn't depend on accessing a "private" method?
Or use self.parts somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per your comment below, we should likely restrict 0.3.0 and use this library properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this matters, but us overloading str like this also likely breaks other aspects of this library (either by causing a recursion error elsewhere, or by including other info that the upath library isn't expecting.
I think we should probably report this as a bug upstream and block 0.3.0 for the moment instead.
For example, self.parts is broken now (and I think this is just 0.3.0:
On 0.2.6:
(Pdb++) self.parts
('/', 'bucket', 'path')
vs 0.3.0
(Pdb++) self.parts
('/', 'bucket', 'ath')
(Pdb++) self
ObjectStoragePath('fake@bucket/path', protocol='fake')
Okay yeah, this is very much an "we are using this library really really wrong":
(Pdb++) CloudPath("s3://bucket/path")
S3Path('bucket/path', protocol='s3')
(Pdb++) CloudPath("s3://bucket/path").parts
('bucket/', 'path')
(Pdb++) upath.__version__
'0.3.0'
|
closing it for now: #56370 |
|
created issue here closing it for now: #56370 |



universal-pathlib version 0.3.0 introduced several changes. One notable update is that the path property now uses str(self) internally https://github.com/fsspec/universal_pathlib/blob/v0.3.0/upath/core.py#L218-L234. This leads to a recursion error in our case because ObjectStoragePath implements a custom str method, and accessing self.path triggers an infinite loop. To resolve this, we should switch to using _raw_urlpaths instead.
https://pypi.org/project/universal-pathlib/
https://github.com/apache/airflow/actions/runs/18141256490/job/51633607047
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.