Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index: skip collection for fs with invalid fsid #510

Merged
merged 1 commit into from
Mar 23, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/dvc_data/index/collect.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def _collect_from_index(
cache[(*cache_prefix, *key)] = entry


def collect( # noqa: C901, PLR0912
def collect( # noqa: C901, PLR0912, PLR0915
idxs,
storage,
callback: "Callback" = DEFAULT_CALLBACK,
Expand Down Expand Up @@ -101,6 +101,12 @@ def collect( # noqa: C901, PLR0912
fsid = data.fs.fsid
except (NotImplementedError, AttributeError):
fsid = data.fs.protocol
except BaseException as exc: # noqa: BLE001
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we update dvcfs to re-raise the SCMError as an AttributeError instead when loading fs.fsid, this just pushes the problem further down the line. In this case we will still hit the same SCMError when trying to clone the repo again upon the next fs function call.

I'm not sure if catching the broad exception here is right though, we could re-raise ValueError in dvcfs (and then catch that here instead of BaseException)

cc @efiop

Copy link
Contributor

@efiop efiop Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, BaseException indeed feels a bit too broad.

In this case we will still hit the same SCMError when trying to clone the repo again upon the next fs function call.

This might be fine, as we try to connect to check for auth errors anyway. Not sure if AttirubteError specifically is ideal though, maybe it is. Or BaseException might actually be alright to handle here like you did already, since it is about whatever fsid might throw at us... So sounds like I reach your coclusion as well here 😄 Though again error reporting is kinda odd...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still collect by falling back to data.fs.protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the general case it would probably be better to have fallback behavior, but for dvcfs/gitfs falling back here will just delay the failure to the next gitfs call (when it will retry cloning the repo and fail)

logger.debug(
"skipping index collection for data with invalid fsid",
exc_info=exc,
)
continue

key = (fsid, tokenize(data.path))

Expand Down
Loading