-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dupe IDs are handled when use_existing_tensors=True #390
Conversation
@@ -85,6 +86,100 @@ def test_use_existing_tensors_non_existing(self): | |||
document_id="123", show_vectors=True) | |||
self.assertEqual(use_existing_tensors_doc, regular_doc) | |||
|
|||
tensor_search.delete_index(config=self.config, index_name=self.index_name_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you delete the index, then you don't really overwrite the previous document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Reports on bug found by vitus "chunks" key error:
error happens when adding this doc repeatedly This only happens when
|
things we know:
res_chunks
res_data
question: why does res_chunks have no chunks? in what situation would _source be empty? |
Another situation:
|
Diagnosis (3/15/23) The error
Happens when the following conditions occur
A chunkless doc is one of the following:
Solution:
If empty source is found, set that as |
@@ -940,12 +943,14 @@ def _get_documents_for_upsert( | |||
dummy_res = result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be more appropriate to call this something like not_found_res
Running unit tests: https://github.com/marqo-ai/marqo/actions/runs/4432692586 |
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
2 bug fixes
What is the current behavior? (You can also link to an open issue here)
use_existing_tensors=True
and docs are added with duplicate IDs, a MarqoWebError with no status code is thrown.use_existing_tensors=True
, a KeyError occurs because it looks for'_source
['__chunks']`use_existing_tensors=True
and docs are added with duplicate IDs, docs are added normally. The last doc in the list with the same ID is the one that gets kept.use_existing_tensors
and the chunkless docs bugDoes this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No
Have unit tests been run against this PR? (Has there also been any additional testing?)
Yes
Please check if the PR fulfills these requirements