-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store-gateway: protect from OOMs by limiting chunk bytes per/sec #6833
Comments
I realized this is less trivial than anticipated because we'll have to change the memcached library to also account for failed allocations. |
Hi @dimitarvdimitrov, i would like to take a go at it, could you give a bit more info about the memcached lib ? |
we use a fork of gomemcache which allows to inject a bytes pool for cache get operations. The cache uses the pool to allocate If we're going the route of modifying gomemcache, I think it's ok to do a breaking change then we should do it in a non-breaking way because other downstream users of grafana/gomemcache (Tempo and Loki for example) may only implement the current interface. |
@dimitarvdimitrov so as i understand this, the change in the gomemcache lib should take into accound that we will pass it a low value and therefore it would not be able to get objects ? |
I was thinking of returning an error from the allocator and bubbling this up in gomemcache back to the caller |
@dimitarvdimitrov i will try to code a DRAFT PR over the week. |
Background
A store-gateway can still go out of memory when enough queries requesting a lot of chunks execute at the same time. Combined with #3939 this can lead to out-of-memory errors.
This is a profile from a time period when the store-gateways in a Mimir cluster went out of memory. Most of the memory was spent on the chunks SlabPool
Proposal
I propose to implement an instance limit that caps the total number of bytes used for chunks. The proposed implementation is to provide a new implementation for SlabPool which limits the number of currently allocated bytes and returns an error when the allocation fails.
mimir/pkg/storegateway/series_chunks.go
Lines 398 to 399 in 3a2302b
The text was updated successfully, but these errors were encountered: