Skip to content

Infinte loop with filecache and check_files=True #738

@edwardlonghurst

Description

@edwardlonghurst

What happened:
Using filecache with check_files=True
Discovered using the abfs filesystem and replicated with the local filesystem.

  1. First call: file downloads correcly to cache.
  2. File is updated in Azure
  3. Second call: CachingFileSystem._check_file correctly identifies there is a new version with detail["uid"] != self.fs.ukey(path)
  4. File is downloaded to cache folder
  5. cache file is rewritten by ``CachingFileSystem.save_cache" however the uid is not updated
  6. WholeFileCacheFileSystem._open calls return self._open(path, mode) kicking off the loop again and repeatedly failing the check in step 3

What you expected to happen:
cache file update includes saving the new uid

Minimal Complete Verifiable Example:
The following prints only Hello, then blows the stack

import fsspec

tf = open("testfile.txt", "w")
tf.write('Hello\n')
tf.flush()

fs = fsspec.filesystem("filecache", target_protocol="file", check_files=True)
with fs.open("file://testfile.txt") as fsfile:
    print(fsfile.read())

tf.write('World\n')
tf.flush()

with fs.open("file://testfile.txt") as fsfile:
    print(fsfile.read())

tf.close()

Anything else we need to know?:
Adding
c["uid"] = cache[k]["uid"]
below
c["time"] = max(c["time"], cache[k]["time"])
in CachingFileSystem.save_cache appears to solve the problem, however I do not know enough about fsspec to know if it introduces other problems

Environment:

  • fsspec version: 2021.7.0
  • Python version: 3.8.8
  • Operating System: Windows 10
  • Install method (conda, pip, source): conda

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions