Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill stat info when returning cached data for readdir #306

Merged
merged 2 commits into from
Nov 4, 2024

Conversation

jpalus
Copy link
Contributor

@jpalus jpalus commented Jun 17, 2024

Uncached and cached results for readdir were inconsistent -- the former returned correct stat info for directory entries while the latter didn't. That's because only names of entries were saved in cache without stat info. In turn this leads to issues like
junegunn/fzf#3832 since directory traversal library (https://github.com/charlievieth/fastwalk in this case) relies on proper stat info returned by readdir. Hence when unchached result was returned it gave proper outcome, while with cached result it was wrong.

Cache stat info next to entry name to fix the issue. While file attributes are saved in cache already, they use full path as key. To avoid potentially plenty of allocations, string copying and cache lookups to get each attr, let's keep a copy of stat struct independently to be on the fast path.

Verified

This commit was signed with the committer’s verified signature.
jpalus Jan Palus
Uncached and cached results for readdir were inconsistent -- the former
returned correct stat info for directory entries while the latter
didn't. That's because only names of entries were saved in cache without
stat info. In turn this leads to issues like
junegunn/fzf#3832 since directory traversal
library (https://github.com/charlievieth/fastwalk in this case) relies
on proper stat info returned by readdir. Hence when unchached result was
returned it gave proper outcome, while with cached result it was wrong.

Cache stat info next to entry name to fix the issue. While file
attributes are saved in cache already, they use full path as key. To
avoid potentially plenty of allocations, string copying and cache
lookups to get each attr, let's keep a copy of stat struct independently
to be on the fast path.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@h4sh5
Copy link
Collaborator

h4sh5 commented Nov 4, 2024

Thank you @jpalus!

I think having some test cases for cached stat would be great too, if you don't mind adding some into test/test_sshfs.py in another PR.

@h4sh5 h4sh5 merged commit ddf1e42 into libfuse:master Nov 4, 2024
1 check passed
@jpalus
Copy link
Contributor Author

jpalus commented Nov 22, 2024

@h4sh5 I made an attempt at adding test today but Python's standard library isn't great for raw readdir results. There's os.scandir which seems to fit perfectly except there's no way to detect missing stat info in readdir results: os.DirEntry is already extremely limited in what it provides and the only two methods which could be used are worthless too:

  • stat() always invokes system call so results from readdir are ignored anyway (see docs)
  • is_file() if it notices d_type == DT_UNKNOWN (which was the case for cached response prior to this PR) also results in separate stat() (see source)

So while I've modified tst_readdir to use scandir, it doesn't really test anything new:

    scandir_is = os.scandir(mnt_newdir)
    scandir_is = sorted(scandir_is, key=operator.attrgetter('name'))
    scandir_is2 = os.scandir(mnt_newdir)
    scandir_is2 = sorted(scandir_is2, key=operator.attrgetter('name'))
    scandir_is = list(map(lambda e: (e.name, e.is_file()), scandir_is))
    scandir_is2 = list(map(lambda e: (e.name, e.is_file()), scandir_is2))
    scandir_should = [(os.path.basename(file_), True), (os.path.basename(subdir), False)]
    scandir_should = sorted(scandir_should, key=operator.itemgetter(0))
    assert scandir_is == scandir_should
    assert scandir_is2 == scandir_should

Test passes on revision prior to this change even though it should not. Confirmed with strace in case of tests running with cache there are additional stat() calls.

@h4sh5
Copy link
Collaborator

h4sh5 commented Nov 29, 2024

@jpalus You could try to add tests that run the stat command or similar to get the details you need out of command output - just for safetly tho, use the absolate path to the commands in Ubuntu since the github actions tests run on that (e.g. /usr/bin/stat), and make sure you call subprocess with array arguments and shell=False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants