Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Since upgrading to opensearch 2.4, having issues running knn search at scale #637

Closed
tomhamer opened this issue Nov 30, 2022 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@tomhamer
Copy link

tomhamer commented Nov 30, 2022

Describe the bug
A clear and concise description of what the bug is.

We are using lucene HNSW indices and recently upgraded from opensearch 2.3 to 2.4. Since upgrading to 2.4 we have had latencies 10-150 times higher than we had previously. For example, searches that used to take 100ms are taking 4-15 seconds.

To Reproduce
Steps to reproduce the behavior:

  1. Create a large lucene knn index (100 mil vectors)
  2. Search it using approximate knn
  3. Search latency is very high (5-15 seconds)

Expected behavior
A clear and concise description of what you expected to happen.
Search latency of approx 200ms

Host/Environment (please complete the following information):

  • OS: amazon linux
  • Version Opensearch 2.4

Additional context
Add any other context about the problem here.

@tomhamer tomhamer added bug Something isn't working untriaged labels Nov 30, 2022
@dblock dblock transferred this issue from opensearch-project/OpenSearch Nov 30, 2022
@dblock
Copy link
Member

dblock commented Nov 30, 2022

Moving this to k-nn repo. Do you have any more info about the dataset?

@martin-gaievski
Copy link
Member

martin-gaievski commented Nov 30, 2022

@tomhamer In addition to previous ask please share details about your OpenSearch cluster configuration: num of data/leader nodes, hardware type, RAM size, RAM for java heap and params used for mapping and indexing: number of shards/replicas, for lucene hnsw value for m and ef_construction.
Is this a static data set or you're changing data in parallel with search requests? Did you enable force merge for the index?

@martin-gaievski
Copy link
Member

@tomhamer Could you please answer one more question - did you run indexing while upgrading from 2.3 to 2.4?

@martin-gaievski
Copy link
Member

martin-gaievski commented Dec 19, 2022

@tomhamer We have identified the reason for latencies you're seeing.
With Lucene 9.4 (that is the base for OpenSearch/kNN 2.4) Lucene community made choose of making segment merges in parallel with data ingestion to optimize indexing time. This approach creates more segments comparing to 9.3/OpenSearch/kNN 2.3. On OpenSearch side segment files are not picked up for a fast MMap reading mode, we fall back to default slower but less resource consuming NIO mode.

While our team is working on a long-term solution you can use following workaround:

For new indexes that you're going to create:

  • add "index.store.hybrid.mmap.extensions" setting to list of index settings. User value for this setting will override default one, please make sure you include existing list of extensions as well as any additional file extension that needed to be read with MMap. You need to add "vec" and "vex" extensions, e.g.:
{
  "settings": {
    "index": {
      "knn": true,
      "refresh_interval": "30s",
      "number_of_shards": 3,
      "number_of_replicas": 0,
      "store.hybrid.mmap.extensions" :  ["nvd", "dvd", "tim", "tip", "dim", "kdd", "kdi", "cfs", "doc", "vec", "vex"]
    }
  },

For existing indexes you need to do re-indexing in order to keep data. You need to create new index with updated "hybrid.mmap.extensions" setting and then re-index data.

  • create second index "updated_index" with "store.hybrid.mmap.extensions" setting:
PUT /updated_index
{
  "settings": {
    "index": {
      "knn": true,
      "refresh_interval": "30s",
      "number_of_shards": 24,
      "number_of_replicas": 1,
      "store.hybrid.mmap.extensions" :  ["nvd", "dvd", "tim", "tip", "dim", "kdd", "kdi", "cfs", "doc", "vec", "vex"]
    }
  },
  "mappings": {
    "properties": {
      "target_field": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene"
        }
      }
    }
  }
}
  • run re-index request from existing to updated index
POST _reindex
{
   "source":{
      "index":"current_index"
   },
   "dest":{
      "index":"updated_index"
   }
}
  • forward requests to "updated_index". Original index can be deleted after this.

@martin-gaievski
Copy link
Member

I'd like to add to my previous post that it's possible to add new file extensions via opensearch.yml file. Exact line should be:

index.store.hybrid.mmap.extensions: [nvd, dvd, tim, tip, dim, kdd, kdi, cfs, doc, vec, vex]

Similarly to update via API, this setting will override pre-delivered list, please make sure you include list of standard extensions along with vec and vex files for vector values.

@tomhamer
Copy link
Author

Thanks Martin, this is really useful. We are testing it at the moment.

@martin-gaievski
Copy link
Member

With mentioned PR things should be improved in 2.5 release.

@tomhamer
Copy link
Author

Thanks @martin-gaievski this is excellent - it solved the problem. Really appreciate the work put into the investigation here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants