Skip to content

Conversation

@gopidesupavan
Copy link
Member

universal-pathlib version 0.3.0 introduced several changes. One notable update is that the path property now uses str(self) internally https://github.com/fsspec/universal_pathlib/blob/v0.3.0/upath/core.py#L218-L234. This leads to a recursion error in our case because ObjectStoragePath implements a custom str method, and accessing self.path triggers an infinite loop. To resolve this, we should switch to using _raw_urlpaths instead.

https://pypi.org/project/universal-pathlib/

https://github.com/apache/airflow/actions/runs/18141256490/job/51633607047


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutil.copyfileobj(f1, f2, **kwargs)

def copy(self, dst: str | ObjectStoragePath, recursive: bool = False, **kwargs) -> None:
def copy(self, dst: str | ObjectStoragePath, recursive: bool = False, **kwargs) -> None: # type: ignore[override]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we normally don't use ignores unless absolutely necessary as ignores don't really fix stuff but just hide the issue.
isn't there a way to fix without ignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i dont have any idea the super class signature is like this:

  task-sdk/src/airflow/sdk/io/path.py:297: note:      Superclass:
  task-sdk/src/airflow/sdk/io/path.py:297: note:          def [T: WritablePath] copy(self, target: T, **kwargs: Any) -> T
  task-sdk/src/airflow/sdk/io/path.py:297: note:      Subclass:
  task-sdk/src/airflow/sdk/io/path.py:297: note:          def copy(self, dst: str | ObjectStoragePath, recursive: bool = ..., **kwargs: Any) -> None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling for the expert @uranusjr

Copy link
Member Author

@gopidesupavan gopidesupavan Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i removed that now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didnt worked mypy still not happy

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Oct 1, 2025

I think it looks okay. assuming is_relative_to an api update. If that is the case, it could warrant a release notes update event though it is downstream update?

@gopidesupavan
Copy link
Member Author

I think it looks okay. assuming is_relative_to an api update. If that is the case, it could warrant a release notes update event though it is downstream update?

yeah it looks like it has updates https://github.com/fsspec/universal_pathlib/pull/405/files

@gopidesupavan gopidesupavan force-pushed the fix-upath-recursion-isse branch from a0415dc to d9953f4 Compare October 1, 2025 07:22
@gopidesupavan
Copy link
Member Author

will be away for whole day please feel free to merge this, mypy still not happy and have added ignore for now.

@bolkedebruin
Copy link
Contributor

I think it looks okay. assuming is_relative_to an api update. If that is the case, it could warrant a release notes update event though it is downstream update?

yeah it looks like it has updates https://github.com/fsspec/universal_pathlib/pull/405/files

I couldn't find it in that pull request. https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L1074 still has relative_to which behaves differently than is_relative_to. So to me this seems an unrelated change?

@bolkedebruin bolkedebruin self-requested a review October 1, 2025 08:52
Copy link
Contributor

@bolkedebruin bolkedebruin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the unrelated change from the tests or explain why it is required.

@gopidesupavan
Copy link
Member Author

Let's remove the unrelated change from the tests or explain why it is required.

sorry what i meant is , the relative_to function implementation changed.

in 0.3.0 its returning

image

in 0.2.6

image

earlier versions the implementation of relative_to function is in the cloudpath: https://github.com/fsspec/universal_pathlib/pull/405/files#diff-311a53e84d3cff9dd608444fabb9f22eace31a4f35cdd481083d1761296e41cdL63 but its removed from the cloud path instead it is now implemented in core with slight modification that returns _relative_base https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L1074

So the test failing with the current implementation of relative_to:

image

i could change the assertion from assert o1.relative_to(o2) == o1 -> assert o1.relative_to(o2) hope its fine?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the explanation is clear and reasonable.

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

conn_id = self.storage_options.get("conn_id")
if self._protocol and conn_id:
return f"{self._protocol}://{conn_id}@{self.path}"
return f"{self._protocol}://{conn_id}@{self.parser.join(*self._raw_urlpaths)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the only options? Relying on a private method doesn't feel great.

Could we do this instead:

super(type(self), self).__str__(self.path)

It's much more verbose, and opaque, yes, but it doesn't depend on accessing a "private" method?
Or use self.parts somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your comment below, we should likely restrict 0.3.0 and use this library properly

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this matters, but us overloading str like this also likely breaks other aspects of this library (either by causing a recursion error elsewhere, or by including other info that the upath library isn't expecting.

I think we should probably report this as a bug upstream and block 0.3.0 for the moment instead.

For example, self.parts is broken now (and I think this is just 0.3.0:

On 0.2.6:

(Pdb++) self.parts
('/', 'bucket', 'path')

vs 0.3.0

(Pdb++) self.parts
('/', 'bucket', 'ath')
(Pdb++) self
ObjectStoragePath('fake@bucket/path', protocol='fake')

Okay yeah, this is very much an "we are using this library really really wrong":

(Pdb++) CloudPath("s3://bucket/path")
S3Path('bucket/path', protocol='s3')
(Pdb++) CloudPath("s3://bucket/path").parts
('bucket/', 'path')
(Pdb++) upath.__version__
'0.3.0'

@gopidesupavan
Copy link
Member Author

closing it for now: #56370

@gopidesupavan
Copy link
Member Author

created issue here closing it for now: #56370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants