Query storage by iterating through chunks by batches. #782
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes how the store retrieves chunks, previously one chunks per stream were retrieved first, and in case of large query (high cardinality) even only one chunks per stream could return 3k chunks or more which can easily OOM Loki (3k*2m=6gib).
Now chunks are retrieved by batches of predefined size, default to 50, we iterate through those 50 chunks by first fetching one chunks per stream to filter out streams with matchers and finally we load the full batch (we don't need the lazy iterator anymore) and create an iterator out of it, when the iterator is exhausted we pull the next batch, until there is no more chunks to fetch.
I'd like to also mention that the slice of chunks ref within the batch iterator is split (copy and not resliced) when retrieving a batch to avoid to keep reference from loaded and used chunks.
I've added tests to make sure direction and overlapping are correctly handled but I've also took the time to add all missing tests within the storage package which brings this package to 90% coverage.
/cc @gouthamve I believe this is the continuation of your work so it should be fairly simple for your to review.
This should put an end to any memory issues related to query, except for the labels query which is also on my todo list.