Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting a limit for file cache capacity #14602

Open
bugmakerrrrrr opened this issue Jun 30, 2024 · 2 comments
Open

Setting a limit for file cache capacity #14602

bugmakerrrrrr opened this issue Jun 30, 2024 · 2 comments
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Snapshots

Comments

@bugmakerrrrrr
Copy link
Contributor

bugmakerrrrrr commented Jun 30, 2024

Is your feature request related to a problem? Please describe

Following up to #14004 . Currently we use the total space of file cache path to verify the user-defined file cache size setting.

if (capacity <= 0 || totalSpace <= capacity) {

For the cache scenario, it seems to be not reasonable. The file cache will try to evict blocks when the watermark is higher than user-defined size. At the same time, the query task may need other new blocks that are located at remote storage, if the tolerance space( totalSpace - fileCacheSize) is very small, there will be no free space for caching new blocks and the query task will fail.

Describe the solution you'd like

There are some options:

  1. the user-defined file cache size must be less than a specific percentage of total space (hard code), like 95%;
  2. add another tolerance size setting(non-dynamic, can be set as byte size or percentage).

Related component

Storage:Snapshots

Describe alternatives you've considered

Treats the current file cache capacity as the maximum size allowed, and introduces another setting evict_watermark that is the percentage of the file cache capacity. When the total size of cache entries occupies a proportion of the cache capacity that exceeds the preset watermark, the file cache begins to attempt to evict entries. When a new block needs to be cached and there is no free space in the file cache, we can fail the corresponding query or we can use an on-heap memory block to hold this file block and to serve the query.

In this way, the behavior of the file cache may become more predictable, especially when the search node is deployed with other node roles on the same node, to ensure that the file cache does not encroach on disk space for different purposes, which then affects the normal operation of the node. And when implementing a writable warm index, we may also need to set a file cache along with a local directory, if a sudden large number of queries or a big query causes the file cache to take up all the available space, it can cause writes to fail.

Additional context

No response

@bugmakerrrrrr bugmakerrrrrr added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 30, 2024
@bugmakerrrrrr
Copy link
Contributor Author

@andrross @kotwanikunal any thoughts?

@andrross
Copy link
Member

the user-defined file cache size must be less than a specific percentage of total space (hard code), like 95%;
add another tolerance size setting(non-dynamic, can be set as byte size or percentage).

@bugmakerrrrrr These ideas seem reasonable. One thing to consider is the behavior of the system once it crosses into the disk watermark thresholds. Should we even allow a cache size to be larger than the watermark thresholds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Snapshots
Projects
Status: Next (Next Quarter)
Development

No branches or pull requests

3 participants