Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use top hits optimizations when sorting by a field? #37043

Closed
jpountz opened this issue Dec 31, 2018 · 15 comments
Closed

Can we use top hits optimizations when sorting by a field? #37043

jpountz opened this issue Dec 31, 2018 · 15 comments
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@jpountz
Copy link
Contributor

jpountz commented Dec 31, 2018

Currently optimizations for collection of top documents are only applicable when sorting by score or by a sort order that is congruent with the index-time sort. Given that fields are most of time both indexed and doc-valued, maybe we could optimize sorting in an even greater number of cases.

For instance imagine that a user searches for top documents matching query Q sorted by field @timestamp of type date in descending order. Maybe we could internally rewrite the query under the hood to SHOULD(LongDistanceFeatureQuery(field=@timestamp, origin=Long.MAX_VALUE)) FILTER(Q) and prepend _score to the sort: [ "_score", { "@timestamp": { "order": "desc" } } ].

There are a number of details to sort out, like we could only do that it the field is indexed and we would need to rewrite the produced TopFieldDocs so that they don't include a sort value for the score and maybe take the maximum value of the field rather than Long.MAX_VALUE to avoid defeating the optimization because most scores would be equal after casting to a float, but overall it looks to me like it could help optimize almost all sorted queries without requiring index-time decisions?

@jpountz jpountz added >enhancement :Search/Search Search-related issues that do not fall into other categories team-discuss labels Dec 31, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@mayya-sharipova mayya-sharipova self-assigned this Feb 12, 2019
@mayya-sharipova
Copy link
Contributor

@jpountz Do we want to expose LongDistanceFeatureQuery in elasticsearch and create a corresponding QueryBuilder for it? For example :

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "long_distance_rank_feature": {
            "field": "timestamp",
            "boost": 2,
            "pivot" : 10,
            "origin" : 1550175141321
          }
        }
      ]
    }
  }
}

@jpountz
Copy link
Contributor Author

jpountz commented Feb 14, 2019

I think we should (#33382). But for this particular issue the idea is to rewrite the search request under the hood to something that returns the same hits but runs faster.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Feb 15, 2019

@jpountz thanks for reminding about #33382.

But for this particular issue the idea is to rewrite the search request under the hood to something that returns the same hits but runs faster.

Right, I was trying to do rewrite in SearchSourceBuilder::rewrite method, and noticed first we will need a QueryBuilder for LongDistanceFeatureQuery.

final long origin = (sort.order() == SortOrder.DESC) ? Long.MAX_VALUE : Long.MIN_VALUE;
final long pivotDistance = 5;  // ? not sure what to choose for pivot distance
Query query = LongPoint.newDistanceFeatureQuery(fieldName, 1, origin, pivotDistance);

BoolQueryBuilder rewritten = new BoolQueryBuilder();
rewritten.should();  // need a QueryBuilder for DistanceFeatureQuery to insert it here
rewritten.filter(queryBuilder); // filter for original query

I guess the plan is first to implement #33382

@jpountz
Copy link
Contributor Author

jpountz commented Feb 15, 2019

do rewrite in SearchSourceBuilder::rewrite

In my opinion one challenge with this optimization is that we also need to modify the produced FieldDoc.fields to remove the first element. I suspect implementing this optimization in SearchSourceBuilder#rewrite is going to make it challenging to also modify the produced TopFieldDocs when appropriate. It might have downsides that I haven't though about, but my initial idea was to do it in QueryPhase#execute by modifying the query before execute runs, and modifying the produced top docs before they are set on the search context.

not sure what to choose for pivot distance

The tricky bit with origin and pivotDistance is that we should pick values that make it unlikely that different long values map to the same float score. We'd probably need to use either the min or max value on the shard as the origin, and something in the order of (max-min)/2 for the pivot value.

I guess the plan is first to implement #33382

That works for me if that makes things easier.

@dvisztempacct
Copy link

I was linked to this issue by very helpful people at the elasticon ama booth.

Do I understand correctly that this enhancement would mean I could avoid the big sort-all-matching-documents operation when I want to sort by the value from a single field?

That would be a massive performance improvement for me!

@jpountz
Copy link
Contributor Author

jpountz commented Mar 1, 2019

@dvisztempacct You got it right.

@dvisztempacct
Copy link

@jpountz thank you! I'm very much looking forward to this!

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Mar 6, 2019

@jpountz I have implemented a draft PR for the query conversion, but encountered the following problem:

Setting origin to Long.MAX_VALUE or Long.MIN_VALUE, when documents' fields have small long values produces very tiny scores that they become undifferentiable:

For example, if my index has 3 docs with values: 111, 555, 999, I will get these results:
origin: 9223372036854775807
pivot: 444
As score= pivot / (pivot + |origin - docValue|), and since origin is huge, scores become super tiny - close to 0 and undifferentiable.

"max_score": 4.8138576E-17,
    "hits": [
      {
        "_index": "index1",
        "_type": "_doc",
        "_id": "1",
        "_score": 4.8138576E-17,
        "_source": {
          "my_text": "text1",
          "my_long": 555,
          "my_int": 555
        }
      },
      {
        "_index": "index1",
        "_type": "_doc",
        "_id": "2",
        "_score": 4.8138576E-17,
        "_source": {
          "my_text": "text1",
          "my_long": 999,
          "my_int": 999
        }
      },
      {
        "_index": "index1",
        "_type": "_doc",
        "_id": "3",
        "_score": 4.8138576E-17,
        "_source": {
          "my_text": "text1",
          "my_long": 111,
          "my_int": 111
        }
      }
    ]

We need to set origin differently. Maximum value for the index on all shards? @jpountz what do you think?

Also I am not sure that it is right that pivot values are different on different shards.

@jpountz
Copy link
Contributor Author

jpountz commented Mar 7, 2019

I had mentioned this issue in the description of this issue:

There are a number of details to sort out, like [...] maybe tak[ing] the maximum value of the field rather than Long.MAX_VALUE to avoid defeating the optimization because most scores would be equal after casting to a float

We need to set origin differently. Maximum value for the index on all shards? @jpountz what do you think?

+1 to try it out. For what it's worth, not all shards need to have the same values for origin and pivot since we won't return the produced scores to the coordinating node anyway.

@dvisztempacct
Copy link

Hi all :)

I see that PR #39770 has been merged!

Any idea when / which release this will be a part of?

But I notice that it was merged into a feature branch, and it does not appear that that branch has been merged into master.

What does the roadmap look like for this?

Thanks!

@jpountz
Copy link
Contributor Author

jpountz commented Jun 19, 2019

@dvisztempacct Hi Don, we don't know yet when it will be released. The idea looks promising but it also has some worst-case scenarios that makes sorting perform slower than a naive sort, for instance if you search for the most recent documents and the index is ordered by increasing timestamp (either explicitly via index sorting, or implicitly because index order correlates with timestamp - which is typical when indexing logs). We have a couple ideas to work around this such as shuffling the order of segments at collection time or disabling the optimization based on heuristics, that we need to explore further.

@dvisztempacct
Copy link

@jpountz Thanks for the update :)

In the meantime, is there another way to accomplish this kind of optimization? Right now for a large number of matching documents, my relational databases are much faster options than ES for some of my queries.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Jul 5, 2019
This change pre-sort the index reader leaves (segment) prior to search
when the primary sort is a numeric field eligible to the distance feature
optimization. It also adds a tie breaker on `_doc` to the rewritten sort
in order to bypass the fact that leaves will be collected in a random order.
I ran this patch on the http_logs benchmark and the results are very promising:

```
|                                       50th percentile latency | desc_sort_timestamp |    220.706 |      136544 |   136324 |     ms |
|                                       90th percentile latency | desc_sort_timestamp |    244.847 |      162084 |   161839 |     ms |
|                                       99th percentile latency | desc_sort_timestamp |    316.627 |      172005 |   171688 |     ms |
|                                      100th percentile latency | desc_sort_timestamp |    335.306 |      173325 |   172989 |     ms |
|                                  50th percentile service time | desc_sort_timestamp |    218.369 |     1968.11 |  1749.74 |     ms |
|                                  90th percentile service time | desc_sort_timestamp |    244.182 |      2447.2 |  2203.02 |     ms |
|                                  99th percentile service time | desc_sort_timestamp |    313.176 |     2950.85 |  2637.67 |     ms |
|                                 100th percentile service time | desc_sort_timestamp |    332.924 |     2959.38 |  2626.45 |     ms |
|                                                    error rate | desc_sort_timestamp |          0 |           0 |        0 |      % |
|                                                Min Throughput |  asc_sort_timestamp |   0.801824 |    0.800855 | -0.00097 |  ops/s |
|                                             Median Throughput |  asc_sort_timestamp |   0.802595 |    0.801104 | -0.00149 |  ops/s |
|                                                Max Throughput |  asc_sort_timestamp |   0.803282 |    0.801351 | -0.00193 |  ops/s |
|                                       50th percentile latency |  asc_sort_timestamp |    220.761 |     824.098 |  603.336 |     ms |
|                                       90th percentile latency |  asc_sort_timestamp |    251.741 |     853.984 |  602.243 |     ms |
|                                       99th percentile latency |  asc_sort_timestamp |    368.761 |     893.943 |  525.182 |     ms |
|                                      100th percentile latency |  asc_sort_timestamp |    431.042 |      908.85 |  477.808 |     ms |
|                                  50th percentile service time |  asc_sort_timestamp |    218.547 |     820.757 |  602.211 |     ms |
|                                  90th percentile service time |  asc_sort_timestamp |    249.578 |     849.886 |  600.308 |     ms |
|                                  99th percentile service time |  asc_sort_timestamp |    366.317 |     888.894 |  522.577 |     ms |
|                                 100th percentile service time |  asc_sort_timestamp |    430.952 |     908.401 |   477.45 |     ms |
|                                                    error rate |  asc_sort_timestamp |          0 |           0 |        0 |      % |
```

So roughly 10x faster for the descending sort and 2-3x faster in the ascending case. Note
that I indexed the http_logs with a single client in order to simulate real time-based indices
where document are indexed in their timestamp order.

Relates elastic#37043
jimczi added a commit that referenced this issue Aug 19, 2019
…4021)

This change pre-sort the index reader leaves (segment) prior to search
when the primary sort is a numeric field eligible to the distance feature
optimization. It also adds a tie breaker on `_doc` to the rewritten sort
in order to bypass the fact that leaves will be collected in a random order.
I ran this patch on the http_logs benchmark and the results are very promising:

```
|                                       50th percentile latency | desc_sort_timestamp |    220.706 |      136544 |   136324 |     ms |
|                                       90th percentile latency | desc_sort_timestamp |    244.847 |      162084 |   161839 |     ms |
|                                       99th percentile latency | desc_sort_timestamp |    316.627 |      172005 |   171688 |     ms |
|                                      100th percentile latency | desc_sort_timestamp |    335.306 |      173325 |   172989 |     ms |
|                                  50th percentile service time | desc_sort_timestamp |    218.369 |     1968.11 |  1749.74 |     ms |
|                                  90th percentile service time | desc_sort_timestamp |    244.182 |      2447.2 |  2203.02 |     ms |
|                                  99th percentile service time | desc_sort_timestamp |    313.176 |     2950.85 |  2637.67 |     ms |
|                                 100th percentile service time | desc_sort_timestamp |    332.924 |     2959.38 |  2626.45 |     ms |
|                                                    error rate | desc_sort_timestamp |          0 |           0 |        0 |      % |
|                                                Min Throughput |  asc_sort_timestamp |   0.801824 |    0.800855 | -0.00097 |  ops/s |
|                                             Median Throughput |  asc_sort_timestamp |   0.802595 |    0.801104 | -0.00149 |  ops/s |
|                                                Max Throughput |  asc_sort_timestamp |   0.803282 |    0.801351 | -0.00193 |  ops/s |
|                                       50th percentile latency |  asc_sort_timestamp |    220.761 |     824.098 |  603.336 |     ms |
|                                       90th percentile latency |  asc_sort_timestamp |    251.741 |     853.984 |  602.243 |     ms |
|                                       99th percentile latency |  asc_sort_timestamp |    368.761 |     893.943 |  525.182 |     ms |
|                                      100th percentile latency |  asc_sort_timestamp |    431.042 |      908.85 |  477.808 |     ms |
|                                  50th percentile service time |  asc_sort_timestamp |    218.547 |     820.757 |  602.211 |     ms |
|                                  90th percentile service time |  asc_sort_timestamp |    249.578 |     849.886 |  600.308 |     ms |
|                                  99th percentile service time |  asc_sort_timestamp |    366.317 |     888.894 |  522.577 |     ms |
|                                 100th percentile service time |  asc_sort_timestamp |    430.952 |     908.401 |   477.45 |     ms |
|                                                    error rate |  asc_sort_timestamp |          0 |           0 |        0 |      % |
```

So roughly 10x faster for the descending sort and 2-3x faster in the ascending case. Note
that I indexed the http_logs with a single client in order to simulate real time-based indices
where document are indexed in their timestamp order.

Relates #37043
@dvisztempacct
Copy link

Any updates? :D

@jimczi
Copy link
Contributor

jimczi commented Sep 6, 2019

Hi Don, sorry that it took so long but we made good progress over the past month even though some bits are still missing. We're now focusing on a change in Lucene to allow sharing the minimum score between leaf collectors but that's not a simple one so we don't know when it will be ready yet. We'll update this issue as soon as we have something in Lucene. Stay tuned ;)

jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 25, 2019
… merge

This change adds a new merge policy that interleaves eldest and newest segments picked by MergePolicy#findForcedMerges
and MergePolicy#findForcedDeletesMerges. This allows time-based indices, that usually have the eldest documents
first, to be efficient at finding the most recent documents too. Although we wrap this merge policy for all indices
even though it is mostly useful for time-based but there should be no overhead for other type of indices so it's simpler
than adding a setting to enable it. This change is needed in order to ensure that the optimizations that we are working
on in # remain efficient even after running a force merge.

Relates elastic#37043
jimczi added a commit that referenced this issue Oct 29, 2019
… merge (#48533)

This change adds a new merge policy that interleaves eldest and newest segments picked by MergePolicy#findForcedMerges
and MergePolicy#findForcedDeletesMerges. This allows time-based indices, that usually have the eldest documents
first, to be efficient at finding the most recent documents too. Although we wrap this merge policy for all indices
even though it is mostly useful for time-based but there should be no overhead for other type of indices so it's simpler
than adding a setting to enable it. This change is needed in order to ensure that the optimizations that we are working
on in # remain efficient even after running a force merge.

Relates #37043
jimczi added a commit that referenced this issue Oct 29, 2019
… merge (#48533)

This change adds a new merge policy that interleaves eldest and newest segments picked by MergePolicy#findForcedMerges
and MergePolicy#findForcedDeletesMerges. This allows time-based indices, that usually have the eldest documents
first, to be efficient at finding the most recent documents too. Although we wrap this merge policy for all indices
even though it is mostly useful for time-based but there should be no overhead for other type of indices so it's simpler
than adding a setting to enable it. This change is needed in order to ensure that the optimizations that we are working
on in # remain efficient even after running a force merge.

Relates #37043
mayya-sharipova added a commit that referenced this issue Nov 26, 2019
* Optimize sort on numeric long and date fields (#39770)

Optimize sort on numeric long and date fields, when 
the system property `es.search.long_sort_optimized` is true.

* Skip optimization if the index has duplicate data (#43121)

Skip sort optimization if the index has 50% or more data
with the same value.
When index has a lot of docs with the same value, sort
optimization doesn't make sense, as DistanceFeatureQuery
will produce same scores for these docs, and Lucene
will use the second sort to tie-break. This could be slower
than usual sorting.

* Sort leaves on search according to the primary numeric sort field (#44021)

This change pre-sort the index reader leaves (segment) prior to search
when the primary sort is a numeric field eligible to the distance feature
optimization. It also adds a tie breaker on `_doc` to the rewritten sort
in order to bypass the fact that leaves will be collected in a random order.
I ran this patch on the http_logs benchmark and the results are very promising:

```
|                                       50th percentile latency | desc_sort_timestamp |    220.706 |      136544 |   136324 |     ms |
|                                       90th percentile latency | desc_sort_timestamp |    244.847 |      162084 |   161839 |     ms |
|                                       99th percentile latency | desc_sort_timestamp |    316.627 |      172005 |   171688 |     ms |
|                                      100th percentile latency | desc_sort_timestamp |    335.306 |      173325 |   172989 |     ms |
|                                  50th percentile service time | desc_sort_timestamp |    218.369 |     1968.11 |  1749.74 |     ms |
|                                  90th percentile service time | desc_sort_timestamp |    244.182 |      2447.2 |  2203.02 |     ms |
|                                  99th percentile service time | desc_sort_timestamp |    313.176 |     2950.85 |  2637.67 |     ms |
|                                 100th percentile service time | desc_sort_timestamp |    332.924 |     2959.38 |  2626.45 |     ms |
|                                                    error rate | desc_sort_timestamp |          0 |           0 |        0 |      % |
|                                                Min Throughput |  asc_sort_timestamp |   0.801824 |    0.800855 | -0.00097 |  ops/s |
|                                             Median Throughput |  asc_sort_timestamp |   0.802595 |    0.801104 | -0.00149 |  ops/s |
|                                                Max Throughput |  asc_sort_timestamp |   0.803282 |    0.801351 | -0.00193 |  ops/s |
|                                       50th percentile latency |  asc_sort_timestamp |    220.761 |     824.098 |  603.336 |     ms |
|                                       90th percentile latency |  asc_sort_timestamp |    251.741 |     853.984 |  602.243 |     ms |
|                                       99th percentile latency |  asc_sort_timestamp |    368.761 |     893.943 |  525.182 |     ms |
|                                      100th percentile latency |  asc_sort_timestamp |    431.042 |      908.85 |  477.808 |     ms |
|                                  50th percentile service time |  asc_sort_timestamp |    218.547 |     820.757 |  602.211 |     ms |
|                                  90th percentile service time |  asc_sort_timestamp |    249.578 |     849.886 |  600.308 |     ms |
|                                  99th percentile service time |  asc_sort_timestamp |    366.317 |     888.894 |  522.577 |     ms |
|                                 100th percentile service time |  asc_sort_timestamp |    430.952 |     908.401 |   477.45 |     ms |
|                                                    error rate |  asc_sort_timestamp |          0 |           0 |        0 |      % |
```

So roughly 10x faster for the descending sort and 2-3x faster in the ascending case. Note
that I indexed the http_logs with a single client in order to simulate real time-based indices
where document are indexed in their timestamp order.

Relates #37043

* Remove nested collector in docs response

As we don't use cancellableCollector anymore, it should be removed from
the expected docs response.

* Use collector manager for search when necessary (#45829)

When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
Thus for such a case, we use collectorManager,
where for every segment a dedicated collector will be created.

* Use shared TopFieldCollector manager

Use shared TopFieldCollector manager for sort optimization.
This collector manager is able to exchange minimum competitive
score between collectors

* Correct calculation of avg value to avoid overflow

* Optimize calculating if index has duplicate data
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Nov 29, 2019
Don't run long sort optimization when index is already
sorted on the same field as the sort query parameter.

Relates to elastic#37043, follow up for  elastic#48804
mayya-sharipova added a commit that referenced this issue Dec 2, 2019
Don't run long sort optimization when index is already
sorted on the same field as the sort query parameter.

Relates to #37043, follow up for  #48804
mayya-sharipova added a commit that referenced this issue Dec 2, 2019
Don't run long sort optimization when index is already
sorted on the same field as the sort query parameter.

Relates to #37043, follow up for  #48804
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
Don't run long sort optimization when index is already
sorted on the same field as the sort query parameter.

Relates to elastic#37043, follow up for  elastic#48804
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

5 participants