Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting a terms aggregation on a child top_metrics aggregation fails with a keyword field #78506

Closed
ddolcimascolo opened this issue Sep 30, 2021 · 4 comments
Assignees
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@ddolcimascolo
Copy link

Elasticsearch version (bin/elasticsearch --version): 7.13.3

Plugins installed: []

JVM version (java -version): Bundled JDK

OS version (uname -a if on a Unix-like system): Linux 5.11.0-34-generic #36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Sorting a terms aggregation on a child top_metrics aggregation does not work with a keyword field. Sorting does not happen and the terms aggregation is sorted by _key as if there was no order defined. Sorting on a number metric works just fine.

Steps to reproduce:

Kibana script to reproduce

PUT /sorting_on_top_metrics_keyword
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "uuid": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      },
      "metric": {
        "type": "long"
      }
    }
  }
}

PUT /sorting_on_top_metrics_keyword/_bulk
{ "index": {} }
{ "@timestamp": "2021-10-01", "metric": 1, "name": "first", "uuid": "0002" }
{ "index": {} }
{ "@timestamp": "2021-10-02", "metric": 2, "name": "first", "uuid": "0002" }
{ "index": {} }
{ "@timestamp": "2021-10-01", "metric": 1, "name": "second", "uuid": "0001" }
{ "index": {} }
{ "@timestamp": "2021-10-02", "metric": 3, "name": "second", "uuid": "0001" }
{ "index": {} }
{ "@timestamp": "2021-10-01", "metric": 1, "name": "third", "uuid": "0003" }
{ "index": {} }
{ "@timestamp": "2021-10-02", "metric": 1, "name": "third", "uuid": "0003" }

POST /sorting_on_top_metrics_keyword/_search
{
  "size": 0,
  "_source": false, 
  "aggs": {
    "total": {
      "cardinality": {
        "field": "uuid"
      }
    },
    "hits": {
      "terms": {
        "field": "uuid",
        "order": {
          "hit.name": "desc"
        }
      },
      "aggs": {
        "hit": {
          "top_metrics": {
            "sort": { "@timestamp": "desc" },
            "metrics": [
              { "field": "metric" },
              { "field": "name" }
            ]
          }
        }
      }
    }
  }
}

Produces

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "hits" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "0001",
          "doc_count" : 2,
          "hit" : {
            "top" : [
              {
                "sort" : [
                  "2021-10-02T00:00:00.000Z"
                ],
                "metrics" : {
                  "metric" : 3,
                  "name" : "second"
                }
              }
            ]
          }
        },
        {
          "key" : "0002",
          "doc_count" : 2,
          "hit" : {
            "top" : [
              {
                "sort" : [
                  "2021-10-02T00:00:00.000Z"
                ],
                "metrics" : {
                  "metric" : 2,
                  "name" : "first"
                }
              }
            ]
          }
        },
        {
          "key" : "0003",
          "doc_count" : 2,
          "hit" : {
            "top" : [
              {
                "sort" : [
                  "2021-10-02T00:00:00.000Z"
                ],
                "metrics" : {
                  "metric" : 1,
                  "name" : "third"
                }
              }
            ]
          }
        }
      ]
    },
    "total" : {
      "value" : 3
    }
  }
}

But I was expecting my buckets to be ordered [third, second, first]

Provide logs (if relevant): N/A

@ddolcimascolo ddolcimascolo added >bug needs:triage Requires assignment of a team area label labels Sep 30, 2021
@pgomulka pgomulka added the :Analytics/Aggregations Aggregations label Sep 30, 2021
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 30, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@romseygeek romseygeek removed the needs:triage Requires assignment of a team area label label Oct 1, 2021
@not-napoleon
Copy link
Member

Yup, this looks like a bug. It's definitely intended to work, and we reproduced it following your steps. I'm wrapping up some other work, but I'll take a look at this next and hopefully get a fix for you soon. Thanks for reporting it.

@ddolcimascolo
Copy link
Author

Thx @not-napoleon

We're starting to run some extensive testing trying to replace top_hits by top_metrics. I will report any issues I found as usual.

Keep up the great work!

David

@not-napoleon not-napoleon removed their assignment Jan 28, 2022
@ddolcimascolo
Copy link
Author

Hey guys,

Any news on this one?
We're still unsure about using top_metrics in our app, but from my understanding this issue was something you wanted to fix anyway.

Cheers,
David

salvatore-campagna added a commit to salvatore-campagna/elasticsearch that referenced this issue Mar 17, 2022
Using a double as a return value works only if the field we are
sorting on is a number. If the field is not a value we can convert
to a double, like a non-numeric keyword, converting it to a number
returns `NaN`. Without this patch, sorting takes place on the bucket
key, if the order field points to a non-numeric value. The additional
bucket key comparator is implicitly added as a tie breaker to avoid
non-deterministic sorting of buckets.

With this change we support sorting using any subclass of SortValue.
This means the bucket key will be used just in case of equal values
on the order field.

Issue: elastic#78506
@salvatore-campagna salvatore-campagna self-assigned this Mar 18, 2022
salvatore-campagna added a commit that referenced this issue Mar 21, 2022
Using a double as a return value works only if the field we are
sorting on is a number. If the field is not a value we can convert
to a double, like a non-numeric keyword, converting it to a number
returns `NaN`. Without this patch, sorting takes place on the bucket
key, if the order field points to a non-numeric value. The additional
bucket key comparator is implicitly added as a tie breaker to avoid
non-deterministic sorting of buckets.

With this change we support sorting using any subclass of SortValue.
This means the bucket key will be used just in case of equal values
on the order field.

Issue: #78506
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

7 participants