Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming queries are very inefficient #1195

Closed
bboreham opened this issue Jan 21, 2019 · 6 comments · Fixed by #4341
Closed

Streaming queries are very inefficient #1195

bboreham opened this issue Jan 21, 2019 · 6 comments · Fixed by #4341
Labels
keepalive Skipped by stale bot type/performance

Comments

@bboreham
Copy link
Contributor

I noticed high resource usage in ruler and traced it back to a change where I turned on:

   - -querier.batch-iterators=true
   - -querier.ingester-streaming=true

Upon reverting this change, CPU went down to a third of what it was, memory down to a quarter and network traffic to a fifth.

Profiling suggests vast amounts of memory being used here:

github.com/cortexproject/cortex/pkg/querier/batch.newMergeIterator
/go/src/github.com/cortexproject/cortex/pkg/querier/batch/merge.go
  Total:      6.78TB     7.29TB (flat, cum) 38.52%
     20            .          .            
     21            .          .           	currErr error 
     22            .          .           } 
     23            .          .            
     24            .          .           func newMergeIterator(cs []chunk.Chunk) *mergeIterator { 
     25            .   151.63GB           	css := partitionChunks(cs) 
     26       5.39GB     5.39GB           	its := make([]*nonOverlappingIterator, 0, len(css)) 
     27            .          .           	for _, cs := range css { 
     28            .   365.93GB           		its = append(its, newNonOverlappingIterator(cs)) 
     29            .          .           	} 
     30            .          .            
     31            .          .           	c := &mergeIterator{ 
     32            .          .           		its:        its, 
     33      10.82GB    10.82GB           		h:          make(iteratorHeap, 0, len(its)), 
     34       3.36TB     3.36TB           		batches:    make(batchStream, 0, len(its)*2*promchunk.BatchSize), 
     35       3.40TB     3.40TB           		batchesBuf: make(batchStream, 0, len(its)*2*promchunk.BatchSize), 
     36            .          .           	} 
     37            .          .            
     38            .          .           	for _, iter := range c.its { 
     39            .          .           		if iter.Next(1) { 
     40            .          .           			c.h = append(c.h, iter) 

I am unclear why those sizes have *promchunk.BatchSize - they are allocating slices of Batch which are already sized that big.

@nschad
Copy link
Contributor

nschad commented Jun 22, 2021

Is this still a thing?

@bboreham
Copy link
Contributor Author

I haven't profiled recently, but the code still looks the same to me.

func newMergeIterator(cs []GenericChunk) *mergeIterator {
css := partitionChunks(cs)
its := make([]*nonOverlappingIterator, 0, len(css))
for _, cs := range css {
its = append(its, newNonOverlappingIterator(cs))
}
c := &mergeIterator{
its: its,
h: make(iteratorHeap, 0, len(its)),
batches: make(batchStream, 0, len(its)*2*promchunk.BatchSize),
batchesBuf: make(batchStream, len(its)*2*promchunk.BatchSize),
}

@nschad
Copy link
Contributor

nschad commented Jun 22, 2021

I haven't profiled recently, but the code still looks the same to me.

func newMergeIterator(cs []GenericChunk) *mergeIterator {
css := partitionChunks(cs)
its := make([]*nonOverlappingIterator, 0, len(css))
for _, cs := range css {
its = append(its, newNonOverlappingIterator(cs))
}
c := &mergeIterator{
its: its,
h: make(iteratorHeap, 0, len(its)),
batches: make(batchStream, 0, len(its)*2*promchunk.BatchSize),
batchesBuf: make(batchStream, len(its)*2*promchunk.BatchSize),
}

Isn't this a major problem, when cpu/memory/bandwith explodes like you described? Especially since this is considered a good default (or even is the default)? 🤔

Edit: It is literally the default

@bboreham
Copy link
Contributor Author

Depends how much you use the ruler (or queries) compared to everything else.

Also for the blocks store (which we recommend over chunks), I don't think you will go through that path unless you also turn on the experimental -ingester.stream-chunks-when-using-blocks option.

@nschad
Copy link
Contributor

nschad commented Jun 23, 2021

Depends how much you use the ruler (or queries) compared to everything else.

Also for the blocks store (which we recommend over chunks), I don't think you will go through that path unless you also turn on the experimental -ingester.stream-chunks-when-using-blocks option.

Ok cool I misunderstood I thought this also applies to block storage users

@bboreham
Copy link
Contributor Author

bboreham commented Jul 6, 2021

I figured out what was happening in the code, and also why it hit ruler more than queriers - it is maximally inefficient for instant queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Skipped by stale bot type/performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants