Fill stat info when returning cached data for readdir #306

jpalus · 2024-06-17T10:16:33Z

Uncached and cached results for readdir were inconsistent -- the former returned correct stat info for directory entries while the latter didn't. That's because only names of entries were saved in cache without stat info. In turn this leads to issues like
junegunn/fzf#3832 since directory traversal library (https://github.com/charlievieth/fastwalk in this case) relies on proper stat info returned by readdir. Hence when unchached result was returned it gave proper outcome, while with cached result it was wrong.

Cache stat info next to entry name to fix the issue. While file attributes are saved in cache already, they use full path as key. To avoid potentially plenty of allocations, string copying and cache lookups to get each attr, let's keep a copy of stat struct independently to be on the fast path.

Uncached and cached results for readdir were inconsistent -- the former returned correct stat info for directory entries while the latter didn't. That's because only names of entries were saved in cache without stat info. In turn this leads to issues like junegunn/fzf#3832 since directory traversal library (https://github.com/charlievieth/fastwalk in this case) relies on proper stat info returned by readdir. Hence when unchached result was returned it gave proper outcome, while with cached result it was wrong. Cache stat info next to entry name to fix the issue. While file attributes are saved in cache already, they use full path as key. To avoid potentially plenty of allocations, string copying and cache lookups to get each attr, let's keep a copy of stat struct independently to be on the fast path.

from: - libfuse/sshfs#305 - libfuse/sshfs#306

h4sh5 · 2024-11-04T22:07:09Z

Thank you @jpalus!

I think having some test cases for cached stat would be great too, if you don't mind adding some into test/test_sshfs.py in another PR.

jpalus · 2024-11-22T23:02:55Z

@h4sh5 I made an attempt at adding test today but Python's standard library isn't great for raw readdir results. There's os.scandir which seems to fit perfectly except there's no way to detect missing stat info in readdir results: os.DirEntry is already extremely limited in what it provides and the only two methods which could be used are worthless too:

stat() always invokes system call so results from readdir are ignored anyway (see docs)
is_file() if it notices d_type == DT_UNKNOWN (which was the case for cached response prior to this PR) also results in separate stat() (see source)

So while I've modified tst_readdir to use scandir, it doesn't really test anything new:

    scandir_is = os.scandir(mnt_newdir)
    scandir_is = sorted(scandir_is, key=operator.attrgetter('name'))
    scandir_is2 = os.scandir(mnt_newdir)
    scandir_is2 = sorted(scandir_is2, key=operator.attrgetter('name'))
    scandir_is = list(map(lambda e: (e.name, e.is_file()), scandir_is))
    scandir_is2 = list(map(lambda e: (e.name, e.is_file()), scandir_is2))
    scandir_should = [(os.path.basename(file_), True), (os.path.basename(subdir), False)]
    scandir_should = sorted(scandir_should, key=operator.itemgetter(0))
    assert scandir_is == scandir_should
    assert scandir_is2 == scandir_should

Test passes on revision prior to this change even though it should not. Confirmed with strace in case of tests running with cache there are additional stat() calls.

h4sh5 · 2024-11-29T22:28:44Z

@jpalus You could try to add tests that run the stat command or similar to get the details you need out of command output - just for safetly tho, use the absolate path to the commands in Ubuntu since the github actions tests run on that (e.g. /usr/bin/stat), and make sure you call subprocess with array arguments and shell=False

jpalus mentioned this pull request Jun 17, 2024

Directories not walked recursively in FUSE mounted directories junegunn/fzf#3832

Closed

10 tasks

pld-gitsync pushed a commit to pld-linux/sshfs-fuse that referenced this pull request Jun 17, 2024

add patches for memleak and missing stat in cached readdir; rel 2

b2e7b15

from: - libfuse/sshfs#305 - libfuse/sshfs#306

Merge branch 'master' into fill-stat-info-from-cache

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode

Loading
Loading status checks…

1a52814

h4sh5 merged commit ddf1e42 into libfuse:master Nov 4, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill stat info when returning cached data for readdir #306

Fill stat info when returning cached data for readdir #306

jpalus commented Jun 17, 2024

h4sh5 commented Nov 4, 2024

jpalus commented Nov 22, 2024

h4sh5 commented Nov 29, 2024 •

edited

Loading

Fill stat info when returning cached data for readdir #306

Fill stat info when returning cached data for readdir #306

Conversation

jpalus commented Jun 17, 2024

h4sh5 commented Nov 4, 2024

jpalus commented Nov 22, 2024

h4sh5 commented Nov 29, 2024 • edited Loading

h4sh5 commented Nov 29, 2024 •

edited

Loading