Time series based workload sort optimization through Lucene segment sorting in DocumentReader #6814

gashutos · 2023-03-23T18:37:40Z

Observation

I have noticed through http_logs worklad of OSB, that desc sort on @timestamp field is taking a lot more then asc order sort. There is definitly an improvement we can do here arranging lucene segments in respective order of asc/desc based on sort type specified. We can address this on query time and before I put down my POC numbers & performance gains we got with this, let me give you context.

For the context,

We enabled NumericField Point based sort optimiaztions for all NumericTypes in these 2 PRs (6321 & 6424).
As mentioned in PRs, the performance gains we got through this was a lot, as it skips iterate/compare over non-competitive documents based on its Point values. The basic concept here is, we give a target value and try to find 'k' closest points to these target value and skip other value which we consider non-competitive. Lucene already does it on _score values. now instead of _score, we do it on Point values through above optimizations.

Suggested fix

Now in time series based workload, since all latest timestamp comes in last for a shard in its segment arrangment, the desc query tends to take lot of time and if we reorder them in descending order at DocumentReader level, we can skip iterating over non-competitive segments which were aligned ahead of competitive segments in case of time series event.

Descending/ascending order document retrieval in timeseries workload is very common scenario and based on POC, I see we are gaining 3x to 4x on this optimization.
All we need to do is, at query time, we need to provide sort order for segment here in subReaderSorted while creating DocumentReader object.

Below are the numbers for my POC. Sorting is performed on @timestamp field

This work load is used (http_logs)
The operation was performed on single index size of 12.9 G with 4 shards with 136 segments on it.

Sort Type	hits = 5	hits = 100	hits = 1000
ASC	11	14	693
ASC (in POC)	10	11	11
Desc	185	670	995
Desc (in POC)	92	250	323

If you see, we are gaining anywhere between 3x to 4x here on @timestamp field sorting.

Few considerations before implemeting this,

This will be only optimizing wokloads which has time series and getting ingested in particular order. Worth doing this only for timeseries workloads ? How common are those workloads or scenarios for our users.
The segment merge scenario, it can still alter the sequence of documents (I am testing this what could be the impact here)
Can we provide user an alternative to IndexSort field at API level to define this optimization ? with clear documentation of when this should be used. Index sort definitly user want to avoid since it adds overhead on ingestion time.

The text was updated successfully, but these errors were encountered:

gashutos · 2023-03-23T18:40:14Z

@dblock @reta @nknize @backslasht @Bukhtawar , what do you guys think on this ?

reta · 2023-03-23T23:54:30Z

@gashutos thank you, have you seen this one TimeSeries optimizations in OpenSearch ?

On more general note, the improvements are looking really worth the optimization, but I do not see clear means on how we could rightly conclude that the index fits this type of workload (== has time series and getting ingested in particular order), unless we restrict it to data streams where @timestamp is explicit (Elasticsearch shared similar optimizations https://www.elastic.co/blog/optimizing-sort-queries-in-elasticsearch-for-faster-results).

gashutos · 2023-03-24T09:59:48Z

@reta I think that proposal has some limitations with Point based sort optimizations we have now. I am still confirming that with Rishabh if I am not missing something. By merging dimensions, we will lose ability to have them sort based on Point Based values.

I think ir-respective of that, the code change with this approach are very less and gets sizable improvement, while we fine grain above approach, we can productize and improve with mentioned approach, since changes are within OpenSearch repo itself.

backslasht · 2023-03-29T03:46:18Z

@gashutos - This looks like a promising optimization, few questions.

Echo'ing @reta question, how to determine from the workload if this optimization needs to be enabled or not.
What is the performance impact for other workloads where this optimization is not applicable, especially when the number of segments is high.

backslasht · 2023-03-30T17:05:58Z

Tagging @nknize

nknize · 2023-03-30T17:46:50Z

Couple of quick questions:

Have you benchmarked with the ShuffleForcedMergePolicy? If so, can you post those here? If not, please do and post here?
I think that proposal has some limitations with Point based sort optimizations we have now

You're talking about two different sorts, Segment vs Field. The "point based sort optimizations" you're referring to uses the BKD tree through IndexOrDocValuesQuery to skip non-competitive documents at query time. Segment sorting is more about Segment Geometry and how the segments are visited / merged, which I think is what you're discussing in this issue (hence me asking about the shuffle merge policy benchmarks).
Brainstorming out loud. What about concurrent search? Could we add an enhancement to the concurrent-search sandbox module that searches segments concurrently in something like a crossing pattern? (e.g., threads in reverse and threads forward)

gashutos · 2023-03-30T18:11:30Z

@nknize @backslasht thank you for looking at proposal.

@nknize while I am checking 1st & 3rd one, just quick clarification on 2nd one...

I think that proposal has some limitations with Point based sort optimizations we have now

You're talking about two different sorts, Segment vs Field. The "point based sort optimizations" you're referring to uses the BKD tree through IndexOrDocValuesQuery to skip non-competitive documents at query time. Segment sorting is more about Segment Geometry and how the segments are visited / merged, which I think is what you're discussing in this issue (hence me asking about the shuffle merge policy benchmarks).

This I was talking about #3734 proposed by Rishabh. That appraoch has some contradiction with point based optimization from the high level it looks like to me.

nknize · 2023-03-30T18:36:43Z

This I was talking about #3734 proposed by Rishabh.

Ah, it's been a while since we worked that one! So at the Lucene level that's for dedicated TS* field types and the Lucene BKD changes include rolling up summary stats so you can easily filter based on the precomputed (per segment) stats. There's still a lot of outstanding questions around that one so I'd actually consider that separate from this issue for now. Any improvements you make on this issue can be rolled back into that design, or alternatively you could leverage the precomputed stats to more efficiently sort the segment geometry. If that ends up helping it would be a good reason to get the BKD change POC merged upstream in lucene.

I'd start first by exploring the shuffle merge policy and benchmarking to assess any gains there. I definitely have reservations about forcing a specific asc desc segment order because inevitably a user will want to sort results and access the documents in the opposite direction. Shuffle merge is suppose to converge on the average for all search cases.

gashutos · 2023-04-07T07:25:46Z

@nknize I did benchmarking for ShuffleMergePolicy, I think numbers are not promising. I take away two things from it,

Desc order sort is getting improved 3x as we can see below (but that is only when we do force merge to single segment, in real life scenario where customer do not do force merge, there is no notable difference in desc sort).
ASC sort is getting regressed by 3x.

Below are the results aI collected from http_logs workload on single shard with 3.7 G size. initially this shard has 135 segments and then forced merge to single segment.

Sort Type	hits = 1000
ASC	35
ASC (With Shufflemerge)	140
ASC (With Shufflemerge with forced merge to single segment)	450
Desc	525
Desc (With Shufflemerge)	450
Desc (With Shufflemerge with forced merge to single segment)	181

I think, this policy we can improve more within segment merge itself but still if we look at real production scenario, we will still end up having segments (group of segments) in asc order of timestamp, newer segments would be latest.
That's why I was thinking, since we aren't regressing anything in "Segment order sort" approach during query time (subReadersSorter) we can productize that first and I believe we still can make some improvements in segment merging mechanism for time series based workloads.

I definitely have reservations about forcing a specific asc desc segment order because inevitably a user will want to sort results and access the documents in the opposite direction.

I see the code right now, the way we create IndexWriter and DocumentReader from it only once upon boot up. I was thinking if we can make this sorting order based on sort type during query time, then your concern would be justified right ? let me dig into that more and find if we can have a way around for that....

gashutos · 2023-04-07T07:30:53Z

@backslasht @reta ,
To your first question,

Echo'ing @reta question, how to determine from the workload if this optimization needs to be enabled or not.

I think we may not need to bind this segment ordering to only one type of workload (time series based). If data is uniformerly distributed across segments, order of traversing through segments wont matter. I did this exercises in nyc_txy workload on total_amount/tip_amount fields and I dont see any regression over queries there. This optimization we will only apply when query types has Sort and w.r.t its sort order we will sort segments by its min/max values.

What is the performance impact for other workloads where this optimization is not applicable, especially when the number of segments is high.

This I am still running and will probably have results in some time, will keep posted here.

andrross · 2023-06-07T22:55:53Z

@gashutos FYI, I'm reopening this issue because the implementation was reverted prior to the 2.8 release due to #7878.

gashutos added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 23, 2023

gashutos changed the title ~~Time series based workload sort optimization~~ Time series based workload sort optimization through Lucene segment sorting in DocumentReader Mar 24, 2023

minalsha added the distributed framework label Mar 24, 2023

anasalkouz removed the untriaged label Mar 28, 2023

gashutos mentioned this issue Apr 19, 2023

Time series based workload desc order optimization through reverse segment read #7244

Merged

6 tasks

gashutos closed this as completed May 5, 2023

gashutos mentioned this issue May 8, 2023

[Backport 2.x] Time series based workload desc order optimization through reverse se… #7457

Merged

andrross reopened this Jun 7, 2023

github-actions bot added the untriaged label Jun 7, 2023

gashutos closed this as completed Jun 12, 2023

anasalkouz removed the untriaged label Jun 13, 2023

gashutos self-assigned this Jun 28, 2023

gashutos added the Performance This is for any performance related enhancements or bugs label Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time series based workload sort optimization through Lucene segment sorting in DocumentReader #6814

Time series based workload sort optimization through Lucene segment sorting in DocumentReader #6814

gashutos commented Mar 23, 2023 •

edited

Loading

gashutos commented Mar 23, 2023

reta commented Mar 23, 2023 •

edited

Loading

gashutos commented Mar 24, 2023 •

edited

Loading

backslasht commented Mar 29, 2023

backslasht commented Mar 30, 2023

nknize commented Mar 30, 2023 •

edited

Loading

gashutos commented Mar 30, 2023 •

edited

Loading

nknize commented Mar 30, 2023

gashutos commented Apr 7, 2023 •

edited

Loading

gashutos commented Apr 7, 2023 •

edited

Loading

andrross commented Jun 7, 2023

Time series based workload sort optimization through Lucene segment sorting in DocumentReader #6814

Time series based workload sort optimization through Lucene segment sorting in DocumentReader #6814

Comments

gashutos commented Mar 23, 2023 • edited Loading

Observation

For the context,

Suggested fix

Below are the numbers for my POC. Sorting is performed on @timestamp field

Few considerations before implemeting this,

gashutos commented Mar 23, 2023

reta commented Mar 23, 2023 • edited Loading

gashutos commented Mar 24, 2023 • edited Loading

backslasht commented Mar 29, 2023

backslasht commented Mar 30, 2023

nknize commented Mar 30, 2023 • edited Loading

gashutos commented Mar 30, 2023 • edited Loading

nknize commented Mar 30, 2023

gashutos commented Apr 7, 2023 • edited Loading

gashutos commented Apr 7, 2023 • edited Loading

andrross commented Jun 7, 2023

gashutos commented Mar 23, 2023 •

edited

Loading

reta commented Mar 23, 2023 •

edited

Loading

gashutos commented Mar 24, 2023 •

edited

Loading

nknize commented Mar 30, 2023 •

edited

Loading

gashutos commented Mar 30, 2023 •

edited

Loading

gashutos commented Apr 7, 2023 •

edited

Loading

gashutos commented Apr 7, 2023 •

edited

Loading