Skip to content
/ fscacher Public

Caching results of operations on heavy file trees

License

Notifications You must be signed in to change notification settings

con/fscacher

Repository files navigation

CI Status MIT License

GitHub | PyPI | Issues | Changelog

fscacher provides a cache & decorator for memoizing functions whose outputs depend upon the contents of a file argument.

If you have a function foo() that takes a file path as its first argument, and if the behavior of foo() is pure in the contents of the path and the values of its other arguments, fscacher can help cache that function, like so:

from fscacher import PersistentCache

cache = PersistentCache("insert_name_for_cache_here")

@cache.memoize_path
def foo(path, ...):
    ...

Now the outputs of foo() will be cached for each set of input arguments and for a "fingerprint" (timestamps & size) of each path. If foo() is called twice with the same set of arguments, the result from the first call will be reused for the second, unless the file pointed to by path changes, in which case the function will be run again. If foo() is called with a non-path-like object as the value of path, the cache is ignored.

memoize_path() optionally takes an exclude_kwargs argument, which must be a sequence of names of arguments of the decorated function that will be ignored for caching purposes.

Caches are stored on-disk and thus persist between Python runs. To clear a given PersistentCache and erase its data store, call the clear() method.

By default, caches are stored in the user-wide cache directory, under an fscacher-specific folder, with each one identified by the name passed to the constructor (which defaults to "cache" if not specified). To specify a different location, use the path argument to the constructor instead of passing a name:

cache = PersistentCache(path="/my/custom/location")

If your code runs in an environment where different sets of libraries or the like could be used in different runs, and these make a difference to the output of your function, you can make the caching take them into account by passing a list of library version strings or other identifiers for the current run as the token argument to the PersistentCache constructor.

Finally, PersistentCache's constructor also optionally takes an envvar argument giving the name of an environment variable. If that environment variable is set to "clear" when the cache is constructed, the cache's clear() method will be called at the end of initialization. If the environment variable is set to "ignore" instead, then caching will be disabled, and the cache's memoize_path method will be a no-op. If the given environment variable is not set, or if envvar is not specified, then PersistentCache will query the FSCACHER_CACHE environment variable instead.

Installation

fscacher requires Python 3.7 or higher. Just use pip for Python 3 (You have pip, right?) to install it and its dependencies:

python3 -m pip install fscacher