Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It's been observed that the async batch iterator we're using for fetching Parquet rows might be using too much memory. Note the
query.CloneParquetValues
call underiter.(*AsyncBatchIterator[...]).fillBuffer
in the flame graph:The problem manifests when the query hits downsampled (aggregated) profiles: a row may contain thousands of values. Another factor is the misalignment of the query split interval and the block duration: each sub-range is processed independently, with its own iterator, thus multiplying the memory requirement.
In practice, a large buffer is not required here, as it is only needed to avoid waiting for fetches from individual columns by reading the data from them ahead of time. In turn, each of the columns has it's own "read ahead" buffer, which should minimaize blocking of the top-level iterator.
One way to solve the problem is to make the iterator work with size in bytes and have a predictable memory footprint. In the PR, I reduce the default buffer size and change the allocation strategy to use the new
slices.Grow
function.