Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve search performance for numeric sort queries #10867

Open
arjunkumargiri opened this issue Oct 23, 2023 · 25 comments
Open

Improve search performance for numeric sort queries #10867

arjunkumargiri opened this issue Oct 23, 2023 · 25 comments
Assignees

Comments

@arjunkumargiri
Copy link

Numeric sorting is one of the key query mechanisms used across multiple OpenSearch clusters. It is critical to understand performance characteristics of numeric sorting queries and identify mechanisms to reduce query latency and reduce performance overhead.

To understand query characteristics of numeric sorting, a simple performance testing was performed with below settings:

Benchmark tool: opensearch-benchmark
Workload: geonames
Task: desc_sort_population
Nodes: 1 node
JVM size: 4 GB

Benchmark result:

Metric Value Unit
Min Throughput 70.09 ops/s
Mean Throughput 74.27 ops/s
Median Throughput 74.59 ops/s
Max Throughput 74.79 ops/s
50th percentile latency 6.32264 ms
90th percentile latency 6.81483 ms
99th percentile latency 7.45008 ms
99.9th percentile latency 15.4041 ms
100th percentile latency 16.5634 ms
50th percentile service time 5.55759 ms
90th percentile service time 5.73615 ms
99th percentile service time 6.43827 ms
99.9th percentile service time 14.8174 ms
100th percentile service time 15.4417 ms
error rate 0 %

CPU profile:

Numeric sorting CPU profile

As expected most CPU cycles for numeric sorting is spent in Long comparator to do perform sorting operation. CPU cycles are equally distributed between PointValues operations estimatePointCount and intersect

Opening this issue to brainstorm and identify potential improvements to numeric sorting.

@arjunkumargiri arjunkumargiri converted this from a draft issue Oct 23, 2023
@msfroh
Copy link
Collaborator

msfroh commented Oct 23, 2023

I was brainstorming with @harshavamsi on this one briefly last week.

I think there might be some trickery that we can do especially for the special case where a segment has no deletes.

Specifically, I'm wondering if we can inspect the BKD tree to find the leftmost/rightmost (depending on sort order) smallest range with at least N hits, where N is the size parameter (or the track_total_hits limit). Then we could implicitly attach a range query filter.

I don't know if it would ultimately help, or if it's essentially what happens in the the PointValues estimate/intersect methods anyway.

@harshavamsi
Copy link
Contributor

@msfroh thanks for the inputs. Tagging @rishabhmaurya here as well.

@rishabhmaurya had the idea of essentially trying to help match_all queries that use a descending sort on a numeric field. Rather than going through the entire BKD tree like you mentioned, we could essentially look through the min/max value that makes the most sense for us and then attach a range filter on that node assuming other attributes like the number of hits and the number of docs to be returned are all taken care of first.

I don't think I did a great job of explaining, but I will put up an RFC with my thought process and how we could prune the tree.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Oct 24, 2023

Thanks @harshavamsi for working on it.

I have working version of it in lucene and details of optimization are mentioned here - apache/lucene#12534 and PR rishabhmaurya/lucene#2. I had a discussion around it with @msfroh and we agreed upon its utility.
We can take take early feedback from @nknize as he understands this part of code very well.

I started making changes in opensearch as well because lucene community may not accept it as it works for cases with MatchAllQuery with desc sort and no deletions on numeric field. You can find the opensearch changes here, its still work in progress - rishabhmaurya@f261cb3

@getsaurabh02
Copy link
Member

Should we pull this in rishabhmaurya@f261cb3 and run a benchmark along with profile to identify the early improvements
cc: @sandeshkr419

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Oct 30, 2023

rishabhmaurya@f261cb3 is still work in progress so can't be used directly.
Although, we can build custom lucene jar using - apache/lucene#12534 where I have the changes working and check for the estimates on gains . We may have to tweak with entry condition here - https://github.com/rishabhmaurya/lucene/pull/2/files#diff-79c6a57519ecd1ef504629e62e13d17859a4ffedc58f4602e583ce758a15adc8R294 to find the sweet spot for this optimization.

@getsaurabh02 getsaurabh02 added v2.12.0 Issues and PRs related to version 2.12.0 and removed untriaged labels Oct 30, 2023
@harshavamsi harshavamsi moved this from 🆕 New to 🏗 In progress in Search Project Board Oct 31, 2023
@harshavamsi harshavamsi self-assigned this Oct 31, 2023
@harshavamsi
Copy link
Contributor

Current steps on this:

@harshavamsi
Copy link
Contributor

Preliminary benchmarking results:

Without optimization

Metric Value Unit
Min Throughput 1.5 ops/s
Mean Throughput 1.51 ops/s
Median Throughput 1.51 ops/s
Max Throughput 1.51 ops/s
50th percentile latency 6.23599 ms
90th percentile latency 6.81445 ms
99th percentile latency 7.21335 ms
100th percentile latency 7.22365 ms
50th percentile service time 4.63105 ms
90th percentile service time 5.02198 ms
99th percentile service time 5.20355 ms
100th percentile service time 5.24069 ms
error rate 0 %

With optimization

Metric Value Unit
Min Throughput 1.5 ops/s
Mean Throughput 1.5 ops/s
Median Throughput 1.5 ops/s
Max Throughput 1.5 ops/s
50th percentile latency 8.20805 ms
90th percentile latency 8.61225 ms
99th percentile latency 8.91156 ms
100th percentile latency 9.02062 ms
50th percentile service time 6.5675 ms
90th percentile service time 6.76763 ms
99th percentile service time 7.00944 ms
100th percentile service time 7.10608 ms
error rate 0 %

@rishabhmaurya
Copy link
Contributor

thanks @harshavamsi for running the benchmark. Could you provide more details on the workload and queries you ran?

@harshavamsi
Copy link
Contributor

harshavamsi commented Nov 1, 2023

@rishabhmaurya

I ran this workload and this task:

Workload: geonames
Task: desc_sort_population

I used an r5.2xlarge cluster for both benchmarks. The non optimized run was a regular cluster I had set up to run keyword benchmarking. The optimized cluster was running a custom build of OS with a patched lucene version that included the optimization.

This is the query:

    {
      "name": "desc_sort_population",
      "operation-type": "search",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"population" : "desc"}
        ]
      }
    },

@harshavamsi
Copy link
Contributor

Re-running the benchmark on the optimized cluster:

|                                                 Min Throughput | desc_sort_population |         1.5 |  ops/s |
|                                                Mean Throughput | desc_sort_population |         1.5 |  ops/s |
|                                              Median Throughput | desc_sort_population |         1.5 |  ops/s |
|                                                 Max Throughput | desc_sort_population |         1.5 |  ops/s |
|                                        50th percentile latency | desc_sort_population |     6.71526 |     ms |
|                                        90th percentile latency | desc_sort_population |     7.17203 |     ms |
|                                        99th percentile latency | desc_sort_population |     7.40734 |     ms |
|                                       100th percentile latency | desc_sort_population |     7.46786 |     ms |
|                                   50th percentile service time | desc_sort_population |     5.15482 |     ms |
|                                   90th percentile service time | desc_sort_population |     5.38515 |     ms |
|                                   99th percentile service time | desc_sort_population |     5.79006 |     ms |
|                                  100th percentile service time | desc_sort_population |     5.89911 |     ms |
|                                                     error rate | desc_sort_population |           0 |      % |

@rishabhmaurya
Copy link
Contributor

can you also post the segment stats here and overall index size. Given the latency is already pretty low, this maybe not be the right workload to test against.

@harshavamsi
Copy link
Contributor

harshavamsi commented Nov 2, 2023

@rishabhmaurya here's the segment stats:

{
    "_shards": {
        "total": 7,
        "successful": 6,
        "failed": 0
    },
    "indices": {
        "geonames": {
            "shards": {
                "0": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 16,
                        "num_search_segments": 16,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 10535,
                                "deleted_docs": 0,
                                "size_in_bytes": 3746696,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 10070,
                                "deleted_docs": 0,
                                "size_in_bytes": 3459472,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 47250,
                                "deleted_docs": 0,
                                "size_in_bytes": 15707588,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 23435,
                                "deleted_docs": 0,
                                "size_in_bytes": 8124605,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 111695,
                                "deleted_docs": 0,
                                "size_in_bytes": 31826890,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 59986,
                                "deleted_docs": 0,
                                "size_in_bytes": 18805519,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 18956,
                                "deleted_docs": 0,
                                "size_in_bytes": 6143059,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 488811,
                                "deleted_docs": 0,
                                "size_in_bytes": 127081465,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 545075,
                                "deleted_docs": 0,
                                "size_in_bytes": 139838084,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 175324,
                                "deleted_docs": 0,
                                "size_in_bytes": 48162652,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 10813,
                                "deleted_docs": 0,
                                "size_in_bytes": 2925647,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 161960,
                                "deleted_docs": 0,
                                "size_in_bytes": 38022913,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 52153,
                                "deleted_docs": 0,
                                "size_in_bytes": 13055539,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 223779,
                                "deleted_docs": 0,
                                "size_in_bytes": 55202061,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 272247,
                                "deleted_docs": 0,
                                "size_in_bytes": 66167512,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 66286,
                                "deleted_docs": 0,
                                "size_in_bytes": 17067734,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "1": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 17,
                        "num_search_segments": 17,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 16775,
                                "deleted_docs": 0,
                                "size_in_bytes": 5516801,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 7823,
                                "deleted_docs": 0,
                                "size_in_bytes": 2994373,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 1479,
                                "deleted_docs": 0,
                                "size_in_bytes": 612107,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 42308,
                                "deleted_docs": 0,
                                "size_in_bytes": 13547068,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 40185,
                                "deleted_docs": 0,
                                "size_in_bytes": 13661412,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 6463,
                                "deleted_docs": 0,
                                "size_in_bytes": 2599313,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 73610,
                                "deleted_docs": 0,
                                "size_in_bytes": 21328234,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 120275,
                                "deleted_docs": 0,
                                "size_in_bytes": 34799549,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 23483,
                                "deleted_docs": 0,
                                "size_in_bytes": 7484546,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 496505,
                                "deleted_docs": 0,
                                "size_in_bytes": 129677362,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 431367,
                                "deleted_docs": 0,
                                "size_in_bytes": 112317590,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 153711,
                                "deleted_docs": 0,
                                "size_in_bytes": 42394841,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 64727,
                                "deleted_docs": 0,
                                "size_in_bytes": 15216055,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 3895,
                                "deleted_docs": 0,
                                "size_in_bytes": 1048412,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 214024,
                                "deleted_docs": 0,
                                "size_in_bytes": 53305902,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 500718,
                                "deleted_docs": 0,
                                "size_in_bytes": 118285309,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_g": {
                                "generation": 16,
                                "num_docs": 84258,
                                "deleted_docs": 0,
                                "size_in_bytes": 21819490,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "2": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 17,
                        "num_search_segments": 17,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 18219,
                                "deleted_docs": 0,
                                "size_in_bytes": 6406307,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 14097,
                                "deleted_docs": 0,
                                "size_in_bytes": 4702834,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 1766,
                                "deleted_docs": 0,
                                "size_in_bytes": 801728,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 24677,
                                "deleted_docs": 0,
                                "size_in_bytes": 8677890,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 66197,
                                "deleted_docs": 0,
                                "size_in_bytes": 20670999,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 8773,
                                "deleted_docs": 0,
                                "size_in_bytes": 3353062,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 140084,
                                "deleted_docs": 0,
                                "size_in_bytes": 38727079,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 102668,
                                "deleted_docs": 0,
                                "size_in_bytes": 29354176,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 11886,
                                "deleted_docs": 0,
                                "size_in_bytes": 3646252,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 481359,
                                "deleted_docs": 0,
                                "size_in_bytes": 124498298,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 420947,
                                "deleted_docs": 0,
                                "size_in_bytes": 110980771,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 122864,
                                "deleted_docs": 0,
                                "size_in_bytes": 33941196,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 55618,
                                "deleted_docs": 0,
                                "size_in_bytes": 13156027,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 28840,
                                "deleted_docs": 0,
                                "size_in_bytes": 7174693,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 493797,
                                "deleted_docs": 0,
                                "size_in_bytes": 116969201,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 237488,
                                "deleted_docs": 0,
                                "size_in_bytes": 58809773,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_g": {
                                "generation": 16,
                                "num_docs": 47355,
                                "deleted_docs": 0,
                                "size_in_bytes": 12706968,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "3": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 14,
                        "num_search_segments": 14,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 24750,
                                "deleted_docs": 0,
                                "size_in_bytes": 7997829,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 15401,
                                "deleted_docs": 0,
                                "size_in_bytes": 5526373,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 4274,
                                "deleted_docs": 0,
                                "size_in_bytes": 1670168,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 74714,
                                "deleted_docs": 0,
                                "size_in_bytes": 23282221,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 49504,
                                "deleted_docs": 0,
                                "size_in_bytes": 16640256,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 1152,
                                "deleted_docs": 0,
                                "size_in_bytes": 425884,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 723774,
                                "deleted_docs": 0,
                                "size_in_bytes": 185696573,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 374910,
                                "deleted_docs": 0,
                                "size_in_bytes": 102114378,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 144763,
                                "deleted_docs": 0,
                                "size_in_bytes": 40681188,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 72038,
                                "deleted_docs": 0,
                                "size_in_bytes": 16996726,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 60729,
                                "deleted_docs": 0,
                                "size_in_bytes": 14683267,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 434765,
                                "deleted_docs": 0,
                                "size_in_bytes": 102456131,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 87455,
                                "deleted_docs": 0,
                                "size_in_bytes": 23013397,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 211170,
                                "deleted_docs": 0,
                                "size_in_bytes": 53235342,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "4": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 13,
                        "num_search_segments": 13,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 29157,
                                "deleted_docs": 0,
                                "size_in_bytes": 9596750,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 25289,
                                "deleted_docs": 0,
                                "size_in_bytes": 9046435,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 101070,
                                "deleted_docs": 0,
                                "size_in_bytes": 30725411,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 36789,
                                "deleted_docs": 0,
                                "size_in_bytes": 12505981,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 16360,
                                "deleted_docs": 0,
                                "size_in_bytes": 5759995,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 627072,
                                "deleted_docs": 0,
                                "size_in_bytes": 161563718,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 447053,
                                "deleted_docs": 0,
                                "size_in_bytes": 118268120,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 131989,
                                "deleted_docs": 0,
                                "size_in_bytes": 37526334,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 50138,
                                "deleted_docs": 0,
                                "size_in_bytes": 12468992,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 54350,
                                "deleted_docs": 0,
                                "size_in_bytes": 12764422,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 245263,
                                "deleted_docs": 0,
                                "size_in_bytes": 62303574,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 449658,
                                "deleted_docs": 0,
                                "size_in_bytes": 105290863,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 66300,
                                "deleted_docs": 0,
                                "size_in_bytes": 17125599,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ]
            }
        }
    }
}

Index size:

"store": {
    "size_in_bytes": 2975896540,
    "reserved_in_bytes": 0
},

@rishabhmaurya
Copy link
Contributor

The segment sizes are too small to see any noticeable difference, I can work with you on it next week.

@gashutos
Copy link
Contributor

gashutos commented Nov 6, 2023

@rishabhmaurya The POC you tried would only work for MatchAllQuery. I did try exactly same thing couple of months back, but matchallDocs query along with sorting (vanilla) has rare usage IMO, hence I skipped prototyping it.

@backslasht
Copy link
Contributor

+1 on @gashutos point. @rishabhmaurya - Do you have a specific use case where this will be useful?

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Nov 6, 2023

@gashutos thanks for looking. Yes, I have mentioned in the poc that it is supposed to work only for MatchAllQuery with no doc deletions.
This will be helpful in 2 cases -

  1. Desc numeric sort on any numeric field - This will make the iteration on bigger segments fast assuming there is no index sort on this numeric field and the lucene index size is significant (in GBs). Since such queries usually span across all segments, so theoretically it should makes things fast. I think this is a common use case and we capture this query type in most of benchmark.
  2. Desc sort on @timestamp field with merge policy as LogByteSize - After force merge, the smallest segment could be big enough to make the desc sort query slow. This will be helpful for such cases too.

Can you point me to your poc/issue and also why do you think its a rare case. Thank you

@gashutos
Copy link
Contributor

gashutos commented Nov 6, 2023

@rishabhmaurya
This problem can be divided in two parts why desc order sort is slower compare to asc order.

  1. For timeseries indices, they are in nearly sort in asc. ( which will be the case for logbytesizemerge policy as well )
    RFC in Lucene ->
    [Performance] sort query improvement for sequential ordered data [e.g. timestamp field sort in log data] apache/lucene#12448

  2. For non-timeseries workload where our docIdBased disjoint iterator with bkd based competitive iterator works only in asc order of docIds.
    Reverse BKD based iteration -> [Performance] Traverse BKD point based competitive miterator in reverse order for DESC sort query performance improvement. #7680

The reason we think it is rare scenario because generally in production, we dont see just sort on single field without any filtering clause wrapping it. Again this is observation based on my seen user usecases.

@harshavamsi
Copy link
Contributor

Posting some more number here, same workload and instance but this time with force merging into 1 large segment to see if it could have any impact as well as running on a single primary shard:

Non optimized cluster:

|                                        50th percentile latency |     desc_sort_population |     9.38135 |     ms |
|                                        90th percentile latency |     desc_sort_population |     10.1048 |     ms |
|                                        99th percentile latency |     desc_sort_population |     10.3617 |     ms |
|                                       100th percentile latency |     desc_sort_population |     10.7949 |     ms |
|                                   50th percentile service time |     desc_sort_population |     7.83975 |     ms |
|                                   90th percentile service time |     desc_sort_population |     8.14815 |     ms |
|                                   99th percentile service time |     desc_sort_population |     8.64486 |     ms |
|                                  100th percentile service time |     desc_sort_population |     8.80505 |     ms |
|                                                     error rate |     desc_sort_population |           0 |      % |

Optimized cluster:

|                                        50th percentile latency |     desc_sort_population |     13.4777 |     ms |
|                                        90th percentile latency |     desc_sort_population |     14.0544 |     ms |
|                                        99th percentile latency |     desc_sort_population |      14.372 |     ms |
|                                       100th percentile latency |     desc_sort_population |     15.1146 |     ms |
|                                   50th percentile service time |     desc_sort_population |     11.8186 |     ms |
|                                   90th percentile service time |     desc_sort_population |      12.006 |     ms |
|                                   99th percentile service time |     desc_sort_population |     12.4779 |     ms |
|                                  100th percentile service time |     desc_sort_population |       12.48 |     ms |
|                                                     error rate |     desc_sort_population |           0 |      % |

Will dive into lucene code path to understand where we're spending time when running this workload.

@hdhalter
Copy link

Hi @harshavamsi - will documentation be required for this feature in 2.12?

@msfroh
Copy link
Collaborator

msfroh commented Dec 12, 2023

will documentation be required for this feature in 2.12?

This is purely an internal optimization task. It should not require any documentation.

@kiranprakash154
Copy link
Contributor

Hi, are we on track for this to be released in 2.12 ?

@getsaurabh02 getsaurabh02 added v2.13.0 Issues and PRs related to version 2.13.0 and removed v2.12.0 Issues and PRs related to version 2.12.0 labels Jan 29, 2024
@getsaurabh02
Copy link
Member

Pushing this out to v2.13, since this optimization is still in the investigation stage. Although the benchmarks numbers looks promising, it requires further deep dive into the lucene code path to understand where we're spending time and coming up with the improvement opportunities.

@getsaurabh02 getsaurabh02 moved this from In Progress to Now (This Quarter) in Performance Roadmap Feb 19, 2024
@bbarani bbarani added v2.14.0 and removed v2.13.0 Issues and PRs related to version 2.13.0 labels Mar 4, 2024
@bbarani
Copy link
Member

bbarani commented Mar 4, 2024

Moved it to 2.14.0 as per the discussion with @harshavamsi

@msfroh
Copy link
Collaborator

msfroh commented Mar 4, 2024

We should try benchmarking numeric sort queries with apache/lucene#13149.

Based on the explanation at https://blunders.io/posts/es-benchmark-4-inlining, we may see significant improvement to numeric sorting..

@bbarani
Copy link
Member

bbarani commented Mar 5, 2024

Tagging @opensearch-project/benchmark-core team

@getsaurabh02 getsaurabh02 added v2.15.0 Issues and PRs related to version 2.15.0 and removed v2.14.0 labels Apr 26, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
@getsaurabh02 getsaurabh02 removed the v2.15.0 Issues and PRs related to version 2.15.0 label Jun 6, 2024
@getsaurabh02 getsaurabh02 moved this from Now (This Quarter) to In Progress in Performance Roadmap Aug 15, 2024
@mch2 mch2 moved this from In Progress to Todo in Performance Roadmap Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 3.0.0 (TBD)
Status: Todo
Status: 🏗 In progress
Development

No branches or pull requests

10 participants