toolcache
makes it simple to create and configure caches in python
- save caches to memory or to disk
- memoize functions, instance methods,
@classmethod
s, and@staticmethod
s - control cache size with ttl and eviction policies like lru / fifo / lfu
- use thread safety, process safety, or no safety (default = thread safety)
- use custom hash functions
- track cache usage statistics
pip install toolcache
import toolcache
# memoize function with memory cache
@toolcache.cache('memory')
def f(a, b, c):
return a * b * c
# memoize function with disk cache, stored in a tempdir
@toolcache.cache('disk')
def f(a, b, c):
return a * b * c
# memoize function with disk cache, stored in a persistent dir
@toolcache.cache('disk', cache_dir='/path/to/cache/dir')
def f(a, b, c):
return a * b * c
# remove cache entries once they reach a specific age
@toolcache.cache('disk', ttl='24 hours')
def f(a, b, c):
return a * b * c
# remove cache entries once cache reaches a specific size
@toolcache.cache('disk', max_size=3, max_size_policy='fifo')
def f(a, b, c):
return a * b * c
# specify which args are used to create unique hash of inputs
@toolcache.cache('disk', hash_args=['a', 'b'])
def f(a, b, c):
return a * b * c
# create standalone cache
standalone_cache = toolcache.MemoryCache()
# get cache size
print(f.cache.get_cache_size())
> 4
# track cache usage statistics
print(f.cache.stats)
> {'n_checks': 6,
> 'n_deletes': 2,
> 'n_hashes': 8,
> 'n_hits': 2,
> 'n_loads': 1,
> 'n_misses': 4,
> 'n_saves': 3,
> 'n_size_evictions': 0,
> 'n_ttl_evictions': 0}
# clear cache
f.cache.delete_all_entries()
- Configure Hashing
- Configure Disk Caches
- Use Cache Eviction Policies
- Monitor Cache Usage
- Create Standalone Caches
- Define Custom Cachetypes
toolcache
includes 3 cache types that each inherit from abstract cache class BaseCache
:
cachetype | description | use case |
---|---|---|
MemoryCache |
cache that saves each entry as key-value pair in a dict |
speed |
DiskCache |
cache that saves each entry as a file to disk | persistence, or large data that does not fit in memory |
NullCache |
cache that does not save any entries | programmatically disabling cache |
Caches can be created in two ways:
- decorating a function with
@toolcache.cache(cachetype)
wherecachetype
is'memory'
,'disk'
,'null'
, or a class inheriting fromBaseCache
- creating a standalone cache by instantiating a class that inherits from
BaseCache
The configuration options listed below can be passed to toolcache.cache()
or passed to a standalone cache during initialization.
these configuration options are available to every cache
arg | description | example value | default behavior |
---|---|---|---|
safety |
str name of concurrency safety level, one of 'thread' , 'process' , or None |
'thread' |
'thread' |
verbose |
bool of whether to print info whenever saving to or loading from cache |
False |
False |
cache_name |
bool of whether to print info whenever saving to or loading from cache |
'important_cache' |
use decorated function name, or uuid for a standalone cache |
arg | description | example value | default behavior |
---|---|---|---|
f_hash |
custom function for computing hash | lambda x: hash(x) |
toolcache. compute_hash_json() |
normalize_hash_inputs |
bool of whether to normalize function calls so that for a function f with args a and b , the calls f(1, 2) and f(a=1, b=2) are equivalent |
False |
False |
hash_include_args |
list of str names of arguments used to compute hash |
['arg1', 'arg2'] |
include all args |
hash_exclude_args |
list of str names of arguments excluded from hash |
['arg3', 'arg4'] |
exclude no args |
arg | description | example value | default behavior |
---|---|---|---|
ttl |
Timelength of time-to-live maximum age for entries in cache |
'1000s' |
no max age |
max_size |
int of max size of cache size |
1000 |
no max size |
max_size_policy |
str name of eviction policy to use when max_size is exceeded, one of 'lru' , 'fifo' , or 'lfu' |
'fifo' |
`'lru'' |
arg | description | example value | default behavior |
---|---|---|---|
track_basic_stats |
bool of whether to track basic usage stats |
False |
False |
track_detailed_stats |
bool of whether to track creations and accesses |
False |
False |
track_creation_times |
bool of whether to track creation times |
False |
track only if ttl is not None or max_size_policy == 'fifo' |
track_access_times |
bool of whether to track access times |
False |
track only if max_size_policy == 'lru' |
track_access_counts |
bool of whether to track access counts |
False |
track only if max_size_policy == 'lfu' |
arg | description | example value | default behavior |
---|---|---|---|
cache_dir |
str of directory path to store cache data |
'/path/to/cache_dir' |
create a tmpdir |
file_format |
str of file format to use for cache data, either 'pickle' or 'json' |
'json' |
'pickle' |
f_disk_save |
custom function for saving data to disk, function should take entry_path and entry_data as arguments |
f_save |
save as pickle |
f_disk_load |
custom function for load data from disk, function should take entry_path as an argument |
f_load |
load as pickle |
When using toolcache.cache()
to decorate a function, one should consider 1) how function inputs will be hashed, 2) what attributes will be added to the function, and 3) what arguments might be added to the function.
To save a function input-output pair within a cache, a unique hash must be taken of the inputs.
Under the default hash configuration, each input arg should either be json-serializable or be a hashable object (i.e. it implements a __hash__()
method). By default toolcache
uses orjson
to create these hashes quickly.
If function inputs do not satisfy these criteria, one or more of the cache config parameters should be used:
parameter | description | example |
---|---|---|
f_hash |
provide a custom hash function that takes the same args and kwargs as the decorated function | @toolcache.cache(..., f_hash=f_custom_hash) |
hash_include_args |
specify list of arg names that should be used to compute hash |
@toolcache.cache(..., hash_include_args=['arg1', 'arg2']) |
hash_exclude_args |
specify list of arg names that should not be used to compute hash |
@toolcache.cache(..., hash_exclude_args=['arg3', 'arg4']) |
toolcache.cache()
also works on functions that have *args
or **kwargs
for inputs
Every time the decorated function is called, it can use the following keyword args to control cache behavior.
kwarg | description | default | example |
---|---|---|---|
cache_save |
bool of whether to save output to cache |
True |
f(..., cache_save=False) will not save output to cache |
cache_load |
bool of whether to attempt to load entry from cache |
True |
f(..., cache_load=False) will not attempt to load entry from cache |
cache_verbose |
bool of whether to print info about loading from or saving to cache |
True |
f(..., cache_load=False) will not attempt to load entry from cache |
You can avoid adding these args to the decorated function by using @toolcache.cache(..., add_cache_args=False)
.
The original decorated function can be acessed as f.__wrapped__
.
The cache instance associated with a decorated function f()
can be accessed using f.cache
.
These methods are available on every cache instance:
method | description |
---|---|
compute_entry_hash() |
compute hash of entry |
save_entry() |
save entry data to cache |
exists_in_cache() |
return bool of whether entry exists in cache |
load_entry() |
load entry data from cache |
get_cache_size() |
return int number of items in cache |
delete_entry() |
remove entry from cache |
delete_all_entries() |
delete all entries from cache |
To maximize cache performance, one can disable input name normalization (normalize_hash_inputs=False
), statistic tracking (track_basic_stats=False
and track_detailed_stats=False
), and thread safety (safety=None
).
On a somewhat modern machine with the above settings, the toolcache.cache()
decorator adds about 3 μs to each function call, whereas running a simple function with no cache decorator takes about 50 ns per function call. Using a disk cache instead of a memory cache adds about 25 μs per function call. To truly know whether toolcache
is fast enough for your application you may need to run your own benchmarks.
A large motivation for developing toolcache
was being able to manage memory-based and disk-based caches with a unified interface and feature set. toolcache
is currently the only python package to offer this functionality.
There exist many other python packages for caching and memoization. cacheout
and python-memoization
both provide in-memory caches with many features. Compared to toolcache
these libraries provide a wider variety of cache eviction policies and other interesting features. python-diskcache
provides a feature-rich disk-based cache with Django integration and extensive benchmark comparisons to other solutions.