[Star tree] Star tree indexing performance optimizations #16218
Labels
enhancement
Enhancement or improvement to existing feature or request
Indexing
Indexing, Bulk Indexing and anything related to indexing
Is your feature request related to a problem? Please describe
Currently as part of star tree building during indexing, as part of off-heap build, we duplicate the metric field values for each metric stat, so if a metric field has 4 stats (sum, min, max, value_count) - we write 4 values into index output , instead of one.
Secondly we write individual dimension and metrics directly to IndexOutput.
Describe the solution you'd like
We can write only the actual field value from segments for each of the metric fields during flush.
We can buffer each starTreeDocument in memory and write bytes to indexOutput.
Hers is how time taken to sort and aggregate segment documents to star-tree documents improved after these changes :
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: