Multi-Tiered HF Cache #2816

XenonMolecule · 2025-01-31T01:19:00Z

Is your feature request related to a problem? Please describe.
When working on a shared compute cluster, allocating a shared hugging face cache for popular models can be helpful. However, if everyone uses only one cache, as people use one-off models and datasets, the cache directory will quickly fill up with unnecessary files.

Describe the solution you'd like
Ideally, there could be a two-tiered cache structure. One read-only cache which first checks whether or not the server has already downloaded the model. Such a cache could be filled by some admin users based on the agreed models to include. Then the users personal read-write cache would be the second level of cache hit. With this cache they could download their personal use models and datasets that won't be useful as a shared server resource.

Describe alternatives you've considered

N-tiered cache structure: Let users just specify a list of cache dirs and on a cache miss on all then write to the highest or lowest level write-enabled cache. This would subsume the solution I described above and be a more future proof design, but adds more complexity than is potentially necessary for this application.
As a user just share a read-write cache for all members of the server. This is not ideal, because if this cache fills up on disk space it will prevent anyone from downloading more models. Users will be forced to move to personal caches anyway and this defeats the purpose of the shared cache.

Additional context
Others are using shared huggingface caches: https://benjijang.com/posts/2024/01/shared-hf-cache/ it would be great to add this multitiered cache support! Apologies if such a feature exists and I'm unaware!

Wauplin · 2025-01-31T12:26:33Z

Hi @XenonMolecule, thanks for explaining in details your feature request. At the moment the codebase doesn't support having a read-only shared cache + a read-write personal one. I understand the pros of having such a feature but adding this would be a major change in the download logic -and might be error-prone-. For now I'd like to gauge interest from the community before starting any plans. I'll keep this issue open, anyone is welcome to comment and react if interested :)

As a workaround, here are the few things you might be interested in:

tool to clean the cache: https://huggingface.co/docs/huggingface_hub/guides/manage-cache#clean-your-cache (if shared and becomes too large)
have to caches (one shared, one personal) and switch the cache_dir variable manually in scripts or set HF_HUB_CACHE to one or the other before launching the script. I know it's not as convenient as the solution you've suggested but can be used right now: https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables

XenonMolecule · 2025-01-31T18:23:36Z

Thanks, @Wauplin! Even after I suggested it, I thought this might be tricky to coordinate between huggingface_hub and the various libraries built on it, such as datasets and transformers, which seem to assume a single cache dir. Still, I'm keeping my hopes up that this can make it into the roadmap for a longer horizon update in the future!

I appreciate your suggestions; I'll use them for now! Excited to see how much community interest there is in this idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Tiered HF Cache #2816

Multi-Tiered HF Cache #2816

XenonMolecule commented Jan 31, 2025

Wauplin commented Jan 31, 2025

XenonMolecule commented Jan 31, 2025

Multi-Tiered HF Cache #2816

Multi-Tiered HF Cache #2816

Comments

XenonMolecule commented Jan 31, 2025

Wauplin commented Jan 31, 2025

XenonMolecule commented Jan 31, 2025