Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] [k-NN] Lucene Engine with SIMD support #1062

Closed
vamshin opened this issue Aug 24, 2023 · 5 comments
Closed

[FEATURE] [k-NN] Lucene Engine with SIMD support #1062

vamshin opened this issue Aug 24, 2023 · 5 comments

Comments

@vamshin
Copy link
Member

vamshin commented Aug 24, 2023

Is your feature request related to a problem?
Related to opensearch-project/OpenSearch#9423

@heemin32
Copy link
Collaborator

I have done more testing regarding k-NN feature with different platforms and data set. SIMD improves both indexing and query latency significantly across various data dimensions and sizes.

Cluster configuration: 3 leader nodes (c5.xlarge), 1 data nodes (r5.8xlarge, for arm r6g.8xlarge is used), 16 shards
Test client: c5.4xlarge, 10 threads
OpenSearch version: 2.9.0

Data: 128 dimendions, 62.5M
Ingest latency

  lucene lucene-simd faiss nmslib lucene-arm lucene-arm-simd
p50 (ms) 178.11421 122.30828 (↓31%) 33.20409 32.69062 122.17994 96.55073 (↓21%)
p90 (ms) 274.55679 209.68928 (↓24%) 55.89496 59.54174 217.74488 173.76887 (↓20%)
p99 (ms) 360.08076 371.0854 (↑3%) 136.11067 142.8995 308.54483 366.9349 (↑20%)

Query latency

  lucene lucene-simd faiss nmslib lucene-arm lucene-arm-simd
p50 (ms) 25.46449 21.36737 (↓16%) 19.91484 12.67667 22.23088 18.52915 (↓17%)
p90 (ms) 32.10869 27.49935 (↓14%) 22.70937 15.81869 28.28701 23.47646 (↓17%)
p99 (ms) 38.6863 34.31535 (↓11%) 28.61526 21.55276 34.01487 29.16956 (↓14%)

Time taken to force merge after indexing

  lucene lucene-simd faiss nmslib lucene-arm lucene-arm-simd
Merge time(hours) 51.2 32.4 (↓37%) 45.2 54.4 64.1 38.9 (↓39%)

Data: 768 dimendions, 10M
Ingest latency

  lucene lucene-simd faiss nmslib lucene-arm lucene-arm-simd
p50 (ms) 330.42268 162.81989 (↓51%) 45.58239 44.23989 222.11812 157.54088 (↓29%)
p90 (ms) 473.48127 301.78705 (↓36%) 160.51715 159.05736 429.0347 414.04998 (↓3%)
p99 (ms) 631.20325 628.94332 (↓0%) 2043.30716 2067.34907 711.63805 697.69039 (↓2%)

Query latency

  lucene lucene-simd faiss nmslib lucene-arm lucene-arm-simd
p50 (ms) 56.85642 39.89531 (↓30%) 27.12591 28.30539 44.72728 36.15098 (↓19%)
p90 (ms) 67.9561 50.22711 (↓26%) 34.21036 35.30814 56.28556 46.84457 (↓17%)
p99 (ms) 78.86851 60.37412 (↓23%) 41.48554 43.02242 67.96228 57.9375 (↓15%)

@vamshin vamshin moved this from 2.11.0 (November 16th, 2023) to 2.12.0 in Vector Search RoadMap Sep 29, 2023
@heemin32 heemin32 added v2.12.0 and removed v2.11.0 labels Oct 4, 2023
@binarymax
Copy link

@heemin32 Thanks for this analysis! Do by any chance have merge time for the second 768-Dim dataset? Also if you have some code that lets us replicate these benchmarks that would be really helpful!

@heemin32
Copy link
Collaborator

heemin32 commented Oct 17, 2023

For 768d, didn't triggered force merge manually so don't have same data as 128d. However, Here is merge related metrics from benchmark test itself. I think merge_time_per_shard.max will be the expected time to complete the merge.

I used https://github.com/opensearch-project/k-NN/tree/main/benchmarks/osb tool.
For dataset, used this tool to generate 62.5M of 768d data.

Cluster setting is as follow.

Setting

Cluster Configuration

OS Version | 2.9
Data Node Count | 1
Data Node Type | r5.8xlarge/r6gd.8xlarge
Data Node Disk | 500GB
Leader Node Count | 3
Leader Node Type | c5.xlarge/m6g.xlarge
Leader Node Disk | default
security disabled | yes

Cluster Settings

Index thread qty | 1

Index Settings

refresh interval | 60
primary shards | 16
replica shards | 0
method | hnsw
m | Default(16)
ef_construction | Default(512)
ef_search | 100(lucene set this value to be same as k)

Data set

name | BIGANN
dimension | 128/768
space | L2
index vector count | see below

Benchmark client (1 per cluster)

Machine type | c5.4xlarge
Disk | 500GB
Tool | k-NN OSB

Indexing workload

num_segments to force merge to | 1
indexing clients | 10
bulk size | 500

Search workload

queries per client | 100,000
search clients | 10
k | 100
size | 100

Result

lucene

"merge_time": 228930131,
  "merge_time_per_shard": {
   "min": 13550790,
   "median": 14387661.0,
   "max": 14752239,
   "unit": "ms"
  },

lucene-simd

  "merge_time": 87877915,
  "merge_time_per_shard": {
   "min": 0,
   "median": 5448265,
   "max": 6211345,
   "unit": "ms"
  },

faiss

  "merge_time": 93742053,
  "merge_time_per_shard": {
   "min": 5502626,
   "median": 5880684.0,
   "max": 6198478,
   "unit": "ms"
  },

nmslib

  "merge_time": 89370420,
  "merge_time_per_shard": {
   "min": 0,
   "median": 5372385,
   "max": 7014883,
   "unit": "ms"
  },

lucene-arm

  "merge_time": 141826170,
  "merge_time_per_shard": {
   "min": 0,
   "median": 8796794.5,
   "max": 9359578,
   "unit": "ms"
  },

lucene-arm-simd

  "merge_time": 92882788,
  "merge_time_per_shard": {
   "min": 0,
   "median": 5804163.5,
   "max": 6304836,
   "unit": "ms"
  },

@binarymax
Copy link

Thanks so much for the detailed response, this is very helpful!

@heemin32
Copy link
Collaborator

Update to JDK21 is completed in OpenSearch 2.12. opensearch-project/OpenSearch#11003

@github-project-automation github-project-automation bot moved this from 2.12.0 to ✅ Done in Vector Search RoadMap Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

5 participants