reduce k value to result count #219

tomhamer · 2022-12-11T05:58:43Z

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
we reduce k to equal the result count, which vastly improves search latency. Since we have moved to lucene we no longer see the error with inner hits not being returned
What is the current behavior? (You can also link to an open issue here)
Search latency is very high due to a high k - this is a problem since we moved from NMSLIB to lucene based search.
What is the new behavior (if this is a feature change)?
Reduces search lateny on large indexes by approx 20x.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
no
Have unit tests been run against this PR? (Has there also been any additional testing?)
Related Python client changes (link commit/PR here)
Related documentation changes (link commit/PR here)
Other information:
Please check if the PR fulfills these requirements

The commit message follows our guidelines
Tests for the changes have been added (for bug fixes/features)
Docs have been added / updated (for bug fixes / features)

pandu-k

This change results in a failing unit test: https://github.com/marqo-ai/marqo/actions/runs/3671015792/jobs/6205945452#step:8:3873

This unit test is regarding protections for when result_count gets too big (over 10k).

This hits the k limit of Marqo-os.

Suggestion:
- Correct the test (perhaps by mocking Marqo-os via mocking a GET request)
- Set the default for this limit to the Marqo-os limit (for the best user experience):

marqo/src/marqo/tensor_search/configs.py

Line 35 in bb54a78

EnvVars.MARQO_MAX_RETRIEVABLE_DOCS: None,

Also, what are the implications for a large result_count (result_count > 500)? For small Ks this change would result in a speed-up, but latency is probably degraded for these large cases

pandu-k · 2022-12-11T22:55:47Z

Code to test the impact of missing inner hits:

An OpenSearch community member's gist
OpenSearch contributor's gist

Default max docs set to 10k in line with marqo-os This is because Marqo-os requires k>= 0.

pandu-k · 2022-12-12T02:39:50Z

Tested for missing inner hits. None retrieved

pandu-k · 2022-12-12T02:41:56Z

Ran test suite, all passed

reduce k value to result count

edf2c51

pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:04 — with GitHub Actions Inactive

pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:05 — with GitHub Actions Inactive

pandu-k had a problem deploying to marqo-test-suite December 11, 2022 22:05 — with GitHub Actions Failure

pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:05 — with GitHub Actions Inactive

pandu-k requested changes Dec 11, 2022

View reviewed changes

minimum result_count now set to 0.

feef174

Default max docs set to 10k in line with marqo-os This is because Marqo-os requires k>= 0.

pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:00 — with GitHub Actions Inactive

pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:01 — with GitHub Actions Inactive

pandu-k approved these changes Dec 12, 2022

View reviewed changes

pandu-k merged commit 538710a into mainline Dec 12, 2022

pandu-k deleted the reduce-k-value branch December 12, 2022 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce k value to result count #219

reduce k value to result count #219

tomhamer commented Dec 11, 2022

pandu-k left a comment •

edited

Loading

pandu-k commented Dec 11, 2022

pandu-k commented Dec 12, 2022

pandu-k commented Dec 12, 2022

reduce k value to result count #219

reduce k value to result count #219

Conversation

tomhamer commented Dec 11, 2022

pandu-k left a comment • edited Loading

Choose a reason for hiding this comment

pandu-k commented Dec 11, 2022

pandu-k commented Dec 12, 2022

pandu-k commented Dec 12, 2022

pandu-k left a comment •

edited

Loading