Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: FileCachedDiskArray #195

Open
asinghvi17 opened this issue Oct 10, 2024 · 4 comments
Open

Feature request: FileCachedDiskArray #195

asinghvi17 opened this issue Oct 10, 2024 · 4 comments

Comments

@asinghvi17
Copy link
Member

It would be nice to have a larger memory limit on CachedDiskArrays when doing computation, and to not occupy RAM that could be used for data or computation.

In this case, might it make sense to have a FileCachedDiskArray, that does the same thing as CachedDiskArray but using files instead of a RAM cache? Or maybe even unpack and mmap a whole disk array if it's sufficiently small. Not sure what the best abstraction is here, but more flexibility would always be good.

@meggart
Copy link
Collaborator

meggart commented Oct 21, 2024

This is a nice idea, in particular when dealing with remote arrays. Thinking about an appropriate abstraction, would it be possible to handle this through the LRU Cache itself? DiskArrays only implements the chunk logic for cached arrays but the caching of data itself is completely handled by LRUCache.jl. Maybe these mmap and disk ideas would be good extensions to that package?

@meggart
Copy link
Collaborator

meggart commented Nov 7, 2024

It looks like LRUCache does not really care what to cache, which is nice so I added an option to cache to store the cached chunks in temp files and mmap them. Let me know if this might solve your use case.

@asinghvi17
Copy link
Member Author

asinghvi17 commented Nov 7, 2024

That sounds great! Excited to see it. This would probably also interact well with RechunkedDiskArray (cache(RechunkedDiskArray(...)) would be a nice workaround to super low file load speeds from archival hard drives, for example).

@meggart
Copy link
Collaborator

meggart commented Nov 8, 2024

That sounds great! Excited to see it. This would probably also interact well with RechunkedDiskArray (cache(RechunkedDiskArray(...)) would be a nice workaround to super low file load speeds from archival hard drives, for example).

For this use case, would you need a cache that is a bit more permanent? The current implementation stores the cached chunks in /tmp and cleans automatically when the process exits. Feel free to propose an interface for a more permanent cache if you think of use cases where you need the cache to survive sessions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants