Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce k value to result count #219

Merged
merged 2 commits into from
Dec 12, 2022
Merged

reduce k value to result count #219

merged 2 commits into from
Dec 12, 2022

Conversation

tomhamer
Copy link
Contributor

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    we reduce k to equal the result count, which vastly improves search latency. Since we have moved to lucene we no longer see the error with inner hits not being returned

  • What is the current behavior? (You can also link to an open issue here)
    Search latency is very high due to a high k - this is a problem since we moved from NMSLIB to lucene based search.

  • What is the new behavior (if this is a feature change)?
    Reduces search lateny on large indexes by approx 20x.

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    no

  • Have unit tests been run against this PR? (Has there also been any additional testing?)

  • Related Python client changes (link commit/PR here)

  • Related documentation changes (link commit/PR here)

  • Other information:

  • Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@pandu-k pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:04 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:04 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:05 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:05 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 11, 2022 22:05 — with GitHub Actions Inactive
Copy link
Collaborator

@pandu-k pandu-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change results in a failing unit test: https://github.com/marqo-ai/marqo/actions/runs/3671015792/jobs/6205945452#step:8:3873

This unit test is regarding protections for when result_count gets too big (over 10k).

This hits the k limit of Marqo-os.

Suggestion:
- Correct the test (perhaps by mocking Marqo-os via mocking a GET request)
- Set the default for this limit to the Marqo-os limit (for the best user experience):

EnvVars.MARQO_MAX_RETRIEVABLE_DOCS: None,

Also, what are the implications for a large result_count (result_count > 500)? For small Ks this change would result in a speed-up, but latency is probably degraded for these large cases

@pandu-k
Copy link
Collaborator

pandu-k commented Dec 11, 2022

Code to test the impact of missing inner hits:

  • An OpenSearch community member's gist
  • OpenSearch contributor's gist

Default max docs set to 10k in line with marqo-os

This is because Marqo-os requires k>= 0.
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:00 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:00 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:00 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:00 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:01 — with GitHub Actions Inactive
@pandu-k pandu-k temporarily deployed to marqo-test-suite December 12, 2022 01:01 — with GitHub Actions Inactive
@pandu-k
Copy link
Collaborator

pandu-k commented Dec 12, 2022

Tested for missing inner hits. None retrieved

@pandu-k
Copy link
Collaborator

pandu-k commented Dec 12, 2022

Ran test suite, all passed

@pandu-k pandu-k merged commit 538710a into mainline Dec 12, 2022
@pandu-k pandu-k deleted the reduce-k-value branch December 12, 2022 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants