-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next design, directories oriented?/localized, is needed #69
Comments
@yarikoptic I was actually considering an alternative implementation for fscacher recently, one that uses SQLite3 for storing data instead of a distributed file store. I have no idea what effect that would have on performance. Here are my notes so far: Proposal for a SQLite-Based Reimplementation of
|
cool, thank you @jwodder! I didn't analyze in full, some notes on DB aspect which I think is to address the
|
This issue could be considered a duplicate of earlier dandi/dandi-cli#848 (comment) but as that one just started with suggestion of an alternative implementation within dandi-cli, I decided to file a separate one within fscacher which would provide more coverage over the situation and since I still hope that fscacher could be a reusable package to provide needed solution.
Outstanding issues we have which are all bottlenecking IMHO in our initial simplistic fscacher implementation, which
I1
: uses the same cache for a function (or collection of functions) regardless of their parametrization etcI2
: places all caches into centralized user-wide cache directory (e.g.,~/.cache/fscacher/{cachename}
): caches related to a dataset which might get removed,I3
: uses joblib's memoize so that each invocation gets its own directory + 2 files on disk (thus 3 inodes):(dandi-devel) jovyan@jupyter-yarikoptic:~/.cache/fscacher/dandi-checksums/joblib/dandi/support/digests$ $ ls get_zarr_checksum/b74a957ae8f7e2e4fc2b07d1aaa73775/ metadata.json output.pkl
Such simplistic initial design showed its limitations by
items_limit
andage_limit
options joblib/joblib#1200 but was never finalized/merged)dandi mv
command: more efficient caching dandi/dandi-cli#848 (comment) and some analysis to make itblocked
by joblib backend: memoized_path_copy helper to complement @memoize_path #57The question is how we could redesign (or just expand since current implementation is good in its simplicity) fscacher to possibly
I3
: provide an alternative (to directory with 2 files)I2
and/or may beI1
provide some "locality":.zarr/
folder within a singular cache "file" -- would save inodes, formv
etc operations would be a matter of copying/changing one such file.@decorator
form and instantiate cache per each dandiset/zarr file in a given location. For 'zarr' support though some alternative storage backend would be needededit: adding an explicit section
Features
WDYT @jwodder -- have any ideas/vision on what next design in fscacher could be to make our life in dandi-cli better?
The text was updated successfully, but these errors were encountered: