Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort ASC by a field should give results where that field is not present #31637

Open
adityaverma21 opened this issue Jun 19, 2024 · 4 comments
Open
Assignees
Milestone

Comments

@adityaverma21
Copy link

adityaverma21 commented Jun 19, 2024

Describe` the bug
Sort ASC by a field should order results where that field is not present. But currently sorting by both ASC or DESC gives documents where that field is not present at the last.

To Reproduce
Steps to reproduce the behavior:

  1. Schema Definition
schema test {
  document test {
      field name type string {
         indexing: summary | index
      }
      field lastdate type array<long> {
         indexing: summary | attribute
      }
      field title type array<string> {
         indexing: summary | attribute
      }
  }
}
  1. Documents:
    doc1
{
  "fields": {
     "name": "aditya",
    "lastdate": [1693560729001],
     "title" : ["verma"]
    }
}

doc2

{
  "fields": {
     "name": "aditya",
    "lastdate": [1693560729050],
     "title" : ["adi"]
    }
}

doc3

{
  "fields": {
     "name": "aditya",
    "lastdate": []
    }
}
  1. Query:
    select * from sources * where name contains "aditya" order by title ASC limit 10 offset 0;
    This returns the docs in order doc2 -> doc1 -> doc3

Expected behavior
The behaviour should be that docs where the field is not present should be ordered first in ascending order.
doc3 -> doc2 -> doc1

Environment (please complete the following information):

  • RHEL8
  • Podman

Vespa version
8.221.29

@bjormel
Copy link
Member

bjormel commented Jun 20, 2024

Duplicate of #31106

If the field is unset for a document, the sort value for that document will be lower than the value for any other document with a value set.

If you need another behaviour you can use a document processor to set a value when it is missing when ingesting data.
See Fields

A field can not be defined with a default value. Use a document processor to assign a default to document put/update operations.

@bjormel bjormel closed this as completed Jun 20, 2024
@baldersheim
Copy link
Contributor

This is not the same as #31106. This relates to multi value fields, field title type array<string>.
As a short string sorts before a longer one, all else being equal, you can argue the same here.

@baldersheim baldersheim reopened this Jun 20, 2024
@toregge
Copy link
Member

toregge commented Jun 20, 2024

Issue #26681 contained a request for supporting sorting on multivalue attributes.

From reference doc (https://docs.vespa.ai/en/reference/sorting.html#multivalue-sort-attribute):

When sorting on a multivalue attribute (array or weightedset) one of the values for the document is selected to be used for sorting. Ascending sort order uses the lowest value while descending sort order uses the highest value. A document without any values is considered worse than a document with values, regardless of sort order.

doc3 doesn't have any values for title, thus it is considered worse than doc1 and doc2 regardless of sort order.

@adityaverma21
Copy link
Author

adityaverma21 commented Jun 24, 2024

Can i raise a request for feature where sorting on multivalued attributes is consistent with single valued attributes i.e for fields where value is not set, those documents should be ranked first in ASC order even if its multivalued attributes.

@frodelu frodelu added this to the later milestone Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants