-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor tensor_search::search
to use shared functions from tensor_search::bulk_search
#469
Conversation
index_info.get_vector_properties().keys() | ||
)) | ||
|
||
# Validation for offset (pagination is single field) if offset not provided, validation is not run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving into here so it can be used by both search/ bulk_search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps be moved to a separate validation function (may be out-of-scope for this refactor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, but this is just a move from within the script.
|
||
|
||
def construct_msearch_body_elements(vector_properties_to_search: List[str], offset: int, filter_string: str, index_info: IndexInfo, result_count: int, query_vector: List[float], attributes_to_retrieve: List[str], index_name: str, contextualised_filter: str) -> List[Dict[str, Any]]: | ||
def construct_msearch_body_elements(searchableAttributes: List[str], offset: int, filter_string: str, index_info: IndexInfo, result_count: int, query_vector: List[float], attributes_to_retrieve: List[str], index_name: str, score_modifiers: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change lets:
- search use
construct_msearch_body_elements
- Bulk search to use score modifiers
@@ -1419,14 +1406,17 @@ def bulk_msearch(config: Config, body: List[Dict]) -> List[Dict]: | |||
except KeyError as e: | |||
# KeyError indicates we have received a non-successful result | |||
try: | |||
if "index.max_result_window" in response["responses"][0]["error"]["root_cause"][0]["reason"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just makes it cleaner, and adds
elif root_cause_type == 'query_shard_exception' and root_cause_reason.startswith("Failed to parse query")
to replicate what is in search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to clean this up 👍
merged_vector = np.mean(weighted_vectors, axis=0) | ||
|
||
custom_tensors = q.get_context_tensor() | ||
if custom_tensors is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom tensor support for bulk_search (and search to use this method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call this var context_tensors
to be consistent?
@@ -1647,6 +1649,25 @@ def create_empty_query_response(queries: List[BulkSearchQueryEntity]) -> List[Di | |||
) | |||
) | |||
|
|||
def run_vectorise_pipeline(config: Config, queries: List[BulkSearchQueryEntity], selected_device: Union[Device, str]) -> Dict[Qidx, List[float]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This abstracts the query -> vector pipeline so we can use it for (bulk) search
ea44e22
to
7b914ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 main points:
- Make pydantic models immutable in 99.9%, unless there's a good reason not too
- Lots of unit tests, including edge cases, for all logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including object methods. Even super trivial unit tests help Marqo more stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! just added some questions
fa16411
to
4cedaab
Compare
index_info.get_vector_properties().keys() | ||
)) | ||
|
||
# Validation for offset (pagination is single field) if offset not provided, validation is not run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps be moved to a separate validation function (may be out-of-scope for this refactor)
@@ -1419,14 +1406,17 @@ def bulk_msearch(config: Config, body: List[Dict]) -> List[Dict]: | |||
except KeyError as e: | |||
# KeyError indicates we have received a non-successful result | |||
try: | |||
if "index.max_result_window" in response["responses"][0]["error"]["root_cause"][0]["reason"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to clean this up 👍
merged_vector = np.mean(weighted_vectors, axis=0) | ||
|
||
custom_tensors = q.get_context_tensor() | ||
if custom_tensors is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call this var context_tensors
to be consistent?
merged_vector = np.mean(weighted_vectors, axis=0) | ||
except ValueError as e: | ||
raise errors.InvalidArgError(f"The provided vectors are not in the same dimension of the index." | ||
f"This causes the error when we do `numpy.mean()` over all the vectors.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave out numpy
as this is a Marqo implementation detail (just mean
is fine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy paste. I want to minimise refactors during a code reshuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are looking good
) | ||
vector_jobs_tuple: Tuple[Dict[Qidx, List[VectorisedJobPointer]], Dict[JHash, VectorisedJobs]] = create_vector_jobs(queries, config, selected_device) | ||
|
||
print(vector_jobs_tuple, create_vector_jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm this print statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, I ctrl-F ed for prints in my git diff :(
self.job_to_vectors: Dict[JHash, Dict[str, List[float]]] = { | ||
123: {"a test query": [0.5, 0.5]}, | ||
456: {"another": [0.5, 0.6]}, | ||
789: {"red herring": [0.6, 0.5]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a multimodal combination field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cheers
Changeset