-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache overlapping blocks #2239
Cache overlapping blocks #2239
Conversation
…ssing. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #2239 +/- ##
==========================================
+ Coverage 62.07% 62.27% +0.19%
==========================================
Files 156 157 +1
Lines 12531 12650 +119
==========================================
+ Hits 7779 7878 +99
- Misses 4145 4161 +16
- Partials 607 611 +4
|
for _, b := range blocks { | ||
// if we have already processed and cache block let's use it. | ||
if cache, ok := c.overlappingBlocks[b.Offset()]; ok { | ||
clone := *cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
blocks = append(blocks, b) | ||
} | ||
} | ||
return blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we gain anything by slicing up the existing c.blocks instead of allocating a new slice? Also curious if c.blocks was a slice of pointers if we could save a copy of the block here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll check if it does help. There’s other place where I don’t reslice intentionally because reslicing keep underlying references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one thought/question, aside from that this looks great!
We (@slim-bean and I) realized that the batchIterator in Loki may re-process the same data over and over when more chunks are overlapping than the batch size. The side effect is that some users may process 30GIB of logs when in fact the real data is just 300MB. This affects a lot of queries.
This PR introduces a caches for block that are overlapping to avoid the costly decompression if we need to re-use a block when it overlaps with the next chunk. This is required for correctly deduping.
I've made a benchmark and run it before and after:
I've also run this in our ops cluster and realize a 25% speed up for filter queries.