Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series based workload sort optimization through Lucene segment sorting in DocumentReader #6814

Closed
gashutos opened this issue Mar 23, 2023 · 11 comments
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs

Comments

@gashutos
Copy link
Contributor

gashutos commented Mar 23, 2023

Observation

I have noticed through http_logs worklad of OSB, that desc sort on @timestamp field is taking a lot more then asc order sort. There is definitly an improvement we can do here arranging lucene segments in respective order of asc/desc based on sort type specified. We can address this on query time and before I put down my POC numbers & performance gains we got with this, let me give you context.

For the context,

We enabled NumericField Point based sort optimiaztions for all NumericTypes in these 2 PRs (6321 & 6424).
As mentioned in PRs, the performance gains we got through this was a lot, as it skips iterate/compare over non-competitive documents based on its Point values. The basic concept here is, we give a target value and try to find 'k' closest points to these target value and skip other value which we consider non-competitive. Lucene already does it on _score values. now instead of _score, we do it on Point values through above optimizations.

Suggested fix

Now in time series based workload, since all latest timestamp comes in last for a shard in its segment arrangment, the desc query tends to take lot of time and if we reorder them in descending order at DocumentReader level, we can skip iterating over non-competitive segments which were aligned ahead of competitive segments in case of time series event.

Descending/ascending order document retrieval in timeseries workload is very common scenario and based on POC, I see we are gaining 3x to 4x on this optimization.
All we need to do is, at query time, we need to provide sort order for segment here in subReaderSorted while creating DocumentReader object.

Below are the numbers for my POC. Sorting is performed on @timestamp field

This work load is used (http_logs)
The operation was performed on single index size of 12.9 G with 4 shards with 136 segments on it.

Sort Type hits = 5 hits = 100 hits = 1000
ASC 11 14 693
ASC (in POC) 10 11 11
Desc 185 670 995
Desc (in POC) 92 250 323

If you see, we are gaining anywhere between 3x to 4x here on @timestamp field sorting.

Few considerations before implemeting this,

  1. This will be only optimizing wokloads which has time series and getting ingested in particular order. Worth doing this only for timeseries workloads ? How common are those workloads or scenarios for our users.
  2. The segment merge scenario, it can still alter the sequence of documents (I am testing this what could be the impact here)
  3. Can we provide user an alternative to IndexSort field at API level to define this optimization ? with clear documentation of when this should be used. Index sort definitly user want to avoid since it adds overhead on ingestion time.
@gashutos gashutos added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 23, 2023
@gashutos
Copy link
Contributor Author

@dblock @reta @nknize @backslasht @Bukhtawar , what do you guys think on this ?

@reta
Copy link
Collaborator

reta commented Mar 23, 2023

@gashutos thank you, have you seen this one TimeSeries optimizations in OpenSearch ?

On more general note, the improvements are looking really worth the optimization, but I do not see clear means on how we could rightly conclude that the index fits this type of workload (== has time series and getting ingested in particular order), unless we restrict it to data streams where @timestamp is explicit (Elasticsearch shared similar optimizations https://www.elastic.co/blog/optimizing-sort-queries-in-elasticsearch-for-faster-results).

@gashutos
Copy link
Contributor Author

gashutos commented Mar 24, 2023

@reta I think that proposal has some limitations with Point based sort optimizations we have now. I am still confirming that with Rishabh if I am not missing something. By merging dimensions, we will lose ability to have them sort based on Point Based values.

I think ir-respective of that, the code change with this approach are very less and gets sizable improvement, while we fine grain above approach, we can productize and improve with mentioned approach, since changes are within OpenSearch repo itself.

@gashutos gashutos changed the title Time series based workload sort optimization Time series based workload sort optimization through Lucene segment sorting in DocumentReader Mar 24, 2023
@backslasht
Copy link
Contributor

@gashutos - This looks like a promising optimization, few questions.

  1. Echo'ing @reta question, how to determine from the workload if this optimization needs to be enabled or not.
  2. What is the performance impact for other workloads where this optimization is not applicable, especially when the number of segments is high.

@backslasht
Copy link
Contributor

Tagging @nknize

@nknize
Copy link
Collaborator

nknize commented Mar 30, 2023

Couple of quick questions:

  1. Have you benchmarked with the ShuffleForcedMergePolicy? If so, can you post those here? If not, please do and post here?

  2. I think that proposal has some limitations with Point based sort optimizations we have now

    You're talking about two different sorts, Segment vs Field. The "point based sort optimizations" you're referring to uses the BKD tree through IndexOrDocValuesQuery to skip non-competitive documents at query time. Segment sorting is more about Segment Geometry and how the segments are visited / merged, which I think is what you're discussing in this issue (hence me asking about the shuffle merge policy benchmarks).

  3. Brainstorming out loud. What about concurrent search? Could we add an enhancement to the concurrent-search sandbox module that searches segments concurrently in something like a crossing pattern? (e.g., threads in reverse and threads forward)

@gashutos
Copy link
Contributor Author

gashutos commented Mar 30, 2023

@nknize @backslasht thank you for looking at proposal.

@nknize while I am checking 1st & 3rd one, just quick clarification on 2nd one...

I think that proposal has some limitations with Point based sort optimizations we have now

You're talking about two different sorts, Segment vs Field. The "point based sort optimizations" you're referring to uses the BKD tree through IndexOrDocValuesQuery to skip non-competitive documents at query time. Segment sorting is more about Segment Geometry and how the segments are visited / merged, which I think is what you're discussing in this issue (hence me asking about the shuffle merge policy benchmarks).

This I was talking about #3734 proposed by Rishabh. That appraoch has some contradiction with point based optimization from the high level it looks like to me.

@nknize
Copy link
Collaborator

nknize commented Mar 30, 2023

This I was talking about #3734 proposed by Rishabh.

Ah, it's been a while since we worked that one! So at the Lucene level that's for dedicated TS* field types and the Lucene BKD changes include rolling up summary stats so you can easily filter based on the precomputed (per segment) stats. There's still a lot of outstanding questions around that one so I'd actually consider that separate from this issue for now. Any improvements you make on this issue can be rolled back into that design, or alternatively you could leverage the precomputed stats to more efficiently sort the segment geometry. If that ends up helping it would be a good reason to get the BKD change POC merged upstream in lucene.

I'd start first by exploring the shuffle merge policy and benchmarking to assess any gains there. I definitely have reservations about forcing a specific asc desc segment order because inevitably a user will want to sort results and access the documents in the opposite direction. Shuffle merge is suppose to converge on the average for all search cases.

@gashutos
Copy link
Contributor Author

gashutos commented Apr 7, 2023

@nknize I did benchmarking for ShuffleMergePolicy, I think numbers are not promising. I take away two things from it,

  1. Desc order sort is getting improved 3x as we can see below (but that is only when we do force merge to single segment, in real life scenario where customer do not do force merge, there is no notable difference in desc sort).
  2. ASC sort is getting regressed by 3x.

Below are the results aI collected from http_logs workload on single shard with 3.7 G size. initially this shard has 135 segments and then forced merge to single segment.

Sort Type hits = 1000
ASC 35
ASC (With Shufflemerge) 140
ASC (With Shufflemerge with forced merge to single segment) 450
Desc 525
Desc (With Shufflemerge) 450
Desc (With Shufflemerge with forced merge to single segment) 181

I think, this policy we can improve more within segment merge itself but still if we look at real production scenario, we will still end up having segments (group of segments) in asc order of timestamp, newer segments would be latest.
That's why I was thinking, since we aren't regressing anything in "Segment order sort" approach during query time (subReadersSorter) we can productize that first and I believe we still can make some improvements in segment merging mechanism for time series based workloads.

I definitely have reservations about forcing a specific asc desc segment order because inevitably a user will want to sort results and access the documents in the opposite direction.

I see the code right now, the way we create IndexWriter and DocumentReader from it only once upon boot up. I was thinking if we can make this sorting order based on sort type during query time, then your concern would be justified right ? let me dig into that more and find if we can have a way around for that....

@gashutos
Copy link
Contributor Author

gashutos commented Apr 7, 2023

@backslasht @reta ,
To your first question,

Echo'ing @reta question, how to determine from the workload if this optimization needs to be enabled or not.

I think we may not need to bind this segment ordering to only one type of workload (time series based). If data is uniformerly distributed across segments, order of traversing through segments wont matter. I did this exercises in nyc_txy workload on total_amount/tip_amount fields and I dont see any regression over queries there. This optimization we will only apply when query types has Sort and w.r.t its sort order we will sort segments by its min/max values.

What is the performance impact for other workloads where this optimization is not applicable, especially when the number of segments is high.

This I am still running and will probably have results in some time, will keep posted here.

@andrross
Copy link
Member

andrross commented Jun 7, 2023

@gashutos FYI, I'm reopening this issue because the implementation was reverted prior to the 2.8 release due to #7878.

@andrross andrross reopened this Jun 7, 2023
@gashutos gashutos self-assigned this Jun 28, 2023
@gashutos gashutos added the Performance This is for any performance related enhancements or bugs label Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs
Projects
None yet
Development

No branches or pull requests

7 participants