Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ls: handle broken symlink #7226

Merged
merged 1 commit into from
Jan 6, 2022
Merged

Conversation

pared
Copy link
Contributor

@pared pared commented Jan 5, 2022

Fixes: #7120

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

@pared pared requested a review from efiop January 5, 2022 12:52
@pared pared requested a review from a team as a code owner January 5, 2022 12:52
dvc/repo/ls.py Outdated
Comment on lines 69 to 96
path = (
fs.path.name(fs_path)
if fs_path == info
else fs.path.relpath(info, fs_path)
)

try:
metadata = fs.metadata(info)
if metadata.output_exists or not dvc_only:
ret[path] = {
"isout": metadata.is_output,
"isdir": metadata.isdir,
"isexec": metadata.is_exec,
}

except FileNotFoundError:
# if walk found it but there is no metadata, its broken symlink
if not dvc_only:
ret[path] = {
"isout": False,
"isdir": False,
"isexec": False,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are ignoring FileNotFoundError if dvc_only, which could be a dangerous bug. Also duplicating code, so how about smth like:

        try:
            metadata = fs.metadata(info)
        except FileNotFoundError:
            if dvc_only:
                raise
                
            metadata = Metadata()
            
        if metadata.output_exists or not dvc_only:
            path = (
                fs.path.name(fs_path)
                if fs_path == info
                else fs.path.relpath(info, fs_path)
            )
            ret[path] = {
                "isout": metadata.is_output,
                "isdir": metadata.isdir,
                "isexec": metadata.is_exec,
            }

@efiop
Copy link
Contributor

efiop commented Jan 5, 2022

Btw, how does CLI ls handle broken symlinks? Should we colorize it too? If so, how hard would that be?

@pared
Copy link
Contributor Author

pared commented Jan 5, 2022

Btw, how does CLI ls handle broken symlinks? Should we colorize it too? If so, how hard would that be?

CLI ls handles it the same way as symlinks - we get a color for symlink.
I think that we should not consider that in that in this change: so far we haven't been differentiating between files/dirs and links.

We could consider implementing it. The problem with that is that currently, we are differentiating the colors by determining whether the target is output/dir/executable. If none of the above, we use an extension to determine it. So far there has been no talking about links. In order to provide support, we would need to pass islink returned by fsspec.info up to the Metadata and leverage it to assign a color.

The question is whether we should do that. CLI ls is about inspecting the filesystem so it makes sense to mark the links. However the DVC is about our ML project - the most important is to differentiate between tracked and non tracked assets. I think w should not expand that. Even now, the behavior might be a bit counterintuitive. For example: if our pipeline will be producing executable, it will still be colored as output, because it is the first check we do. I think in this case we should not follow ls style.

As an alternative, we could consider how to "mix" colors depending on the multiple properties of a given asset.

@pared pared force-pushed the 7120_broken_symlink branch 2 times, most recently from 8727a23 to 0f206c0 Compare January 5, 2022 17:17
dvc/repo/ls.py Outdated Show resolved Hide resolved
dvc/repo/ls.py Outdated Show resolved Hide resolved
@pared pared force-pushed the 7120_broken_symlink branch 2 times, most recently from 2f0a008 to 14bfd91 Compare January 6, 2022 07:58
dvc/repo/ls.py Outdated
Comment on lines 75 to 76
if dvc_only:
raise
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current approach (creating empty Metadata) I think it makes sense to remove this check.
it shouldn't matter if we use dvc_only or not. In my initial version lack of this was a bug, silently skipping the error, but now it seems ok to remove it. Users should not be punished for a specific check. If there is a broken symlink, other commands will make sure of erroring out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the case of dvc_only files, we should never get here: repo_fs will be able to get the metadata even for the removed file

@pared pared force-pushed the 7120_broken_symlink branch from 14bfd91 to 5f6755b Compare January 6, 2022 15:31
@pared pared added the bugfix fixes bug label Jan 6, 2022
@efiop efiop merged commit fe4fefd into iterative:main Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DVC list FileNotFoundError
2 participants