-
Notifications
You must be signed in to change notification settings - Fork 107
findCache performance improvements #1262
Comments
Do you think we should keep queues / buffers by |
After discussion with @Dieterbe we think it might be wise to write a separate find() method just for the findCache that simply returns true or false so that it can bail out early. If even one item is a hit, the key must be removed from the cache. This would possibly also save on some memory allocations. |
for the record, I think we've seen an incident in which we've consumed many cores trying to invalidate, without the cache going into backoff mode. We should never used more than 1 core for invalidations. we should also track the number of invalidations, as they come with a cost |
|
Been looking into this. the current find() is a BFS. We would need a DFS, which would a pretty major change from find() but also getMatcher(). I think it's not warranted to proceed with that idea.
I think the metric would be useful to be able to quickly see what's going on in the dashboard. Having to correlate metrics with logs is not very practical. |
Currently, when new series are ingested, they are added to the index, then we create a background goroutine that calls
By default 100 of these goroutines are allowed to run at once. And each of these goroutines could call
find(tree, pattern)
for each pattern in the cache (default of 1000). This can put a lot of CPU strain on the MT instance leading other components being impacted and also cause a lot of memory allocations.We dont really need to invalidate the cache for new series immediately, buffering the series names and processing asynchronously would be fine. The trade-off is that the new series wont be returned for cached find requests until they have been processed. How long an acceptable delay is will vary between users. Some users will want the new series to be returned as soon as possible. Other users might be happy with a delay of a few minutes.
So to help reduce CPU usage and allocations, and keep a cap on the amount of CPU used I propose the following.
Instead of immediately processing series names in InvalidateFor(), we simply push them into a queue and process them in batches from a single goroutine. If the queue reaches X items, or we have not processed the queue in Y duration, then we process the buffer. This is the same buffer approach we use for flushing chunks to the backend store in batches.
https://github.com/grafana/metrictank/blob/master/store/bigtable/bigtable.go#L302-L323
With this approach we can add all of the series names in the batch to a new
Tree
and callfind(tree, pattern)
once for each pattern, instead of N times for new series.The text was updated successfully, but these errors were encountered: