Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate index setting parameters when creating index #420

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

Jeadie
Copy link
Contributor

@Jeadie Jeadie commented Apr 3, 2023

Changes

  • Validate "ann_parameters" index settings at creation time.
  • Currently, these settings, even if invalid, are stored as the index settings. It is only when we use these settings (e.g. add_documents), that an error occurs.
  • This PR validates correct parameters settings when we create_index

@@ -78,7 +78,6 @@ def add_customer_field_properties(config: Config, index_name: str,
Returns:
HTTP Response
"""
engine = "lucene"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

"examples": [
"hnsw"
]
},
NsFields.ann_engine: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now let this be with the remaining parameters, but since we limit it to Lucene, it is now safe.

@Jeadie
Copy link
Contributor Author

Jeadie commented Apr 3, 2023

@Jeadie Jeadie temporarily deployed to marqo-test-suite April 3, 2023 23:29 — with GitHub Actions Inactive
@@ -117,12 +126,14 @@
"properties": {
NsFields.hnsw_ef_construction: {
"type": "integer",
"minimum": 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to put maximums in here? otherwise wouldn't they get a 500 from OpenSearch, if they chose a ridiculous number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ef_construction of 500 would actually be rather reasonable. In fact, the default is 512. We'd have to figure out what upper bounds we'd want to set. I don't think this would be self-evident.

"examples": [
128
]
},
NsFields.hnsw_m: {
"type": "integer",
"minimum": 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to put maximums in here? otherwise wouldn't they get a 500 from OpenSearch, if they chose a ridiculous number?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test to ensure that allowed hnsw params can work OK?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we check the index settings from OpenSearch itself, to ensure that that the kNN index has the expected settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the above, have been tested manually. I don't know if we want these as unit tests, not as performance or just manual testing

@pandu-k pandu-k merged commit b592a20 into mainline Apr 4, 2023
@pandu-k pandu-k deleted the jack/issue-206 branch April 4, 2023 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants